Riya Verma created SPARK-48931:
----------------------------------

             Summary: Reduce Cloud Store List API cost for state store 
maintenance task
                 Key: SPARK-48931
                 URL: https://issues.apache.org/jira/browse/SPARK-48931
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.4.3
            Reporter: Riya Verma


Currently, during the state store maintenance process, we find which old 
version files of the RocksDB state store to delete by listing all existing 
snapshotted version files in the checkpoint directory every 1 minute by 
default. The frequent list calls in the cloud has resulted in high costs, as 
reported by many customers. To address this concern and reduce the cost 
associated with state store maintenance, we should aim to minimize the 
frequency of listing object stores inside the maintenance task. To minimize the 
frequency of listing object stores and amortize the cost of LIST call, we will 
try to accumulate versions to delete and only call list inside 
deleteOldVersions when the number of versions to delete reaches a configured 
threshold. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to