[
https://issues.apache.org/jira/browse/SPARK-48931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riya Verma updated SPARK-48931:
-------------------------------
Description: Currently, during the state store maintenance process, we find
which old version files of the RocksDB state store to delete by listing all
existing snapshotted version files in the checkpoint directory every 1 minute
by default. The frequent list calls in the cloud can result in high costs. To
address this concern and reduce the cost associated with state store
maintenance, we should aim to minimize the frequency of listing object stores
inside the maintenance task. To minimize the frequency, we will try to
accumulate versions to delete and only call list inside *deleteOldVersions*
when the number of versions to delete reaches a configured threshold. (was:
Currently, during the state store maintenance process, we find which old
version files of the RocksDB state store to delete by listing all existing
snapshotted version files in the checkpoint directory every 1 minute by
default. The frequent list calls in the cloud can result in high costs. To
address this concern and reduce the cost associated with state store
maintenance, we should aim to minimize the frequency of listing object stores
inside the maintenance task. To minimize the frequency, we will try to
accumulate versions to delete and only call list inside deleteOldVersions when
the number of versions to delete reaches a configured threshold. )
> Reduce Cloud Store List API cost for state store maintenance task
> -----------------------------------------------------------------
>
> Key: SPARK-48931
> URL: https://issues.apache.org/jira/browse/SPARK-48931
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.4.3
> Reporter: Riya Verma
> Priority: Major
>
> Currently, during the state store maintenance process, we find which old
> version files of the RocksDB state store to delete by listing all existing
> snapshotted version files in the checkpoint directory every 1 minute by
> default. The frequent list calls in the cloud can result in high costs. To
> address this concern and reduce the cost associated with state store
> maintenance, we should aim to minimize the frequency of listing object stores
> inside the maintenance task. To minimize the frequency, we will try to
> accumulate versions to delete and only call list inside *deleteOldVersions*
> when the number of versions to delete reaches a configured threshold.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]