Bhuwan Sahni created SPARK-47568:
------------------------------------
Summary: Fix race condition between maintenance thread and task
thead for RocksDB snapshot
Key: SPARK-47568
URL: https://issues.apache.org/jira/browse/SPARK-47568
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.5.1, 3.5.0, 4.0.0, 3.5.2
Reporter: Bhuwan Sahni
There are currently some race conditions between maintenance thread and task
thread which can result in corrupted checkpoint state.
# The maintenance thread currently relies on class variable {{lastSnapshot}}
to find the latest checkpoint and uploads it to DFS. This checkpoint can be
modified at commit time by Task thread if a new snapshot is created.
# The task thread does not reset lastSnapshot at load time, which can result
in newer snapshots (if a old version is loaded) being considered valid and
uploaded to DFS. This results in VersionIdMismatch errors.
This issue proposes to fix these issues by guarding latestSnapshot variable
modification, and setting latestSnapshot properly at load time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]