[
https://issues.apache.org/jira/browse/SPARK-48931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-48931:
-----------------------------------
Labels: pull-request-available (was: )
> Reduce Cloud Store List API cost for state store maintenance task
> -----------------------------------------------------------------
>
> Key: SPARK-48931
> URL: https://issues.apache.org/jira/browse/SPARK-48931
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.4.3
> Reporter: Riya Verma
> Priority: Major
> Labels: pull-request-available
>
> Currently, during the state store maintenance process, we find which old
> version files of the RocksDB state store to delete by listing all existing
> snapshotted version files in the checkpoint directory every 1 minute by
> default. The frequent list calls in the cloud can result in high costs. To
> address this concern and reduce the cost associated with state store
> maintenance, we should aim to minimize the frequency of listing object stores
> inside the maintenance task. To minimize the frequency, we will try to
> accumulate versions to delete and only call list when the number of versions
> to delete reaches a configured threshold.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]