Riya Verma created SPARK-48931:
----------------------------------
Summary: Reduce Cloud Store List API cost for state store
maintenance task
Key: SPARK-48931
URL: https://issues.apache.org/jira/browse/SPARK-48931
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 3.4.3
Reporter: Riya Verma
Currently, during the state store maintenance process, we find which old
version files of the RocksDB state store to delete by listing all existing
snapshotted version files in the checkpoint directory every 1 minute by
default. The frequent list calls in the cloud has resulted in high costs, as
reported by many customers. To address this concern and reduce the cost
associated with state store maintenance, we should aim to minimize the
frequency of listing object stores inside the maintenance task. To minimize the
frequency of listing object stores and amortize the cost of LIST call, we will
try to accumulate versions to delete and only call list inside
deleteOldVersions when the number of versions to delete reaches a configured
threshold.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]