[ https://issues.apache.org/jira/browse/SPARK-46796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-46796: ----------------------------------- Labels: pull-request-available (was: ) > RocksDB versionID Mismatch in SST files > --------------------------------------- > > Key: SPARK-46796 > URL: https://issues.apache.org/jira/browse/SPARK-46796 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 3.4.2, 3.4.1, 3.5.0, 4.0.0, 3.5.1, 3.5.2 > Reporter: Bhuwan Sahni > Priority: Major > Labels: pull-request-available > > We need to ensure that the correct SST files are used on executor during > RocksDB load as per mapping in metadata.zip. With current implementation, its > possible that the executor uses a SST file (with a different UUID) from a > older version which is not the exact file mapped in the metadata.zip. This > can cause version Id mismatch errors while loading RocksDB leading to > streaming query failures. > Few scenarios in which such a situation can occur are: > **Scenario 1 - Distributed file system does not support overwrite > functionality** > # A task T1 on executor A commits Rocks Db snapshot for version X. > # Another task T2 on executor A loads version X-1, and tries to commit X. > During commit, SST files are copied but metadata file is not overwritten. > # Task T3 is scheduled on A, this task reuses previously loaded X (loaded in > (2) above) and commits X+1. > # Task T4 is scheduled on A again for state store version X. The executor > deletes SST files corresponding to commit X+1, downloads the metadata for > version X (which was committed in task T1), and loads RocksDB. This would > fail because the metadata in (1) is not compatible with SST files in (2). > > **Scenario 2 - Multiple older State versions have different DFS files for a > particular SST file.** > In the current logic, we look at all the versions older than X to find if a > local SST file can be reused. The reuse logic only ensures that the local SST > file was present in any of the previous version. However, its possible that 2 > different older versions had a different SST file (`0001-uuid1.sst` and > `0001-uuid2.sst`) uploaded on DFS. These SST files will have the same local > name (with UUID truncated) and size, but are not compatible due to different > RocksDB Version Ids. We need to ensure that the correct SST file (as per > UUID) is picked as mentioned in the metadata.zip. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org