[ 
https://issues.apache.org/jira/browse/SPARK-48931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riya Verma updated SPARK-48931:
-------------------------------
    Description: Currently, during the state store maintenance process, we find 
which old version files of the RocksDB state store to delete by listing all 
existing snapshotted version files in the checkpoint directory every 1 minute 
by default. The frequent list calls in the cloud can result in high costs. To 
address this concern and reduce the cost associated with state store 
maintenance, we should aim to minimize the frequency of listing object stores 
inside the maintenance task. To minimize the frequency, we will try to 
accumulate versions to delete and only call list inside *deleteOldVersions* 
when the number of versions to delete reaches a configured threshold.   (was: 
Currently, during the state store maintenance process, we find which old 
version files of the RocksDB state store to delete by listing all existing 
snapshotted version files in the checkpoint directory every 1 minute by 
default. The frequent list calls in the cloud can result in high costs. To 
address this concern and reduce the cost associated with state store 
maintenance, we should aim to minimize the frequency of listing object stores 
inside the maintenance task. To minimize the frequency, we will try to 
accumulate versions to delete and only call list inside deleteOldVersions when 
the number of versions to delete reaches a configured threshold. )

> Reduce Cloud Store List API cost for state store maintenance task
> -----------------------------------------------------------------
>
>                 Key: SPARK-48931
>                 URL: https://issues.apache.org/jira/browse/SPARK-48931
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.3
>            Reporter: Riya Verma
>            Priority: Major
>
> Currently, during the state store maintenance process, we find which old 
> version files of the RocksDB state store to delete by listing all existing 
> snapshotted version files in the checkpoint directory every 1 minute by 
> default. The frequent list calls in the cloud can result in high costs. To 
> address this concern and reduce the cost associated with state store 
> maintenance, we should aim to minimize the frequency of listing object stores 
> inside the maintenance task. To minimize the frequency, we will try to 
> accumulate versions to delete and only call list inside *deleteOldVersions* 
> when the number of versions to delete reaches a configured threshold. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to