[
https://issues.apache.org/jira/browse/SPARK-47568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-47568:
-----------------------------------
Labels: pull-request-available (was: )
> Fix race condition between maintenance thread and task thead for RocksDB
> snapshot
> ---------------------------------------------------------------------------------
>
> Key: SPARK-47568
> URL: https://issues.apache.org/jira/browse/SPARK-47568
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.5.0, 4.0.0, 3.5.1, 3.5.2
> Reporter: Bhuwan Sahni
> Priority: Major
> Labels: pull-request-available
>
> There are currently some race conditions between maintenance thread and task
> thread which can result in corrupted checkpoint state.
> # The maintenance thread currently relies on class variable {{lastSnapshot}}
> to find the latest checkpoint and uploads it to DFS. This checkpoint can be
> modified at commit time by Task thread if a new snapshot is created.
> # The task thread does not reset lastSnapshot at load time, which can result
> in newer snapshots (if a old version is loaded) being considered valid and
> uploaded to DFS. This results in VersionIdMismatch errors.
> This issue proposes to fix these issues by guarding latestSnapshot variable
> modification, and setting latestSnapshot properly at load time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]