Hi, We are planning to work on a feature that improves how the RocksDB State Store organizes checkpoint versions in cloud storage. This improvement aims to:
- Address some unexpected query results (see the Appendix <https://docs.google.com/document/d/1uWRMbN927cRXhSm5oeV3pbwb6o73am4r1ckEJDhAHa0/edit#heading=h.ehd1najwmeos> of the design document for examples). - Reduce the likelihood of RocksDB corruptions caused by checkpointing a new version based on a previous version that is not the official version. The first section <https://docs.google.com/document/d/1pT5bwW325VndVH09aZd7TykKXbBP5uCrNIQOC0uDoLc/edit#heading=h.z2wsnxxpksge> of the documentation explains how this can happen. The core idea is to ensure that the state store checkpoint for a batch version strictly depends on the previously checkpointed version. We propose achieving this by introducing a unique ID for each state store checkpoint and persisting this ID in the commit logs. This feature will be controlled by a new SQL configuration, allowing users to switch between the old and new modes when running a supported release. You can find the design document here <https://docs.google.com/document/d/1pT5bwW325VndVH09aZd7TykKXbBP5uCrNIQOC0uDoLc/edit> and the related JIRA ticket SPARK-49374 <https://issues.apache.org/jira/browse/SPARK-49374>. As we believe this improvement isn't significant enough to require going through the SPIP process, we don't plan to pursue that route. However, if you have any concerns, please let us know. We would also greatly appreciate any comments or feedback on the design. Best regards, Siying