[jira] [Updated] (FLINK-38574) Avoid reusing re-uploaded sst files when checkpoint notification is delayed

Zakelly Lan (Jira) Mon, 27 Oct 2025 03:14:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-38574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zakelly Lan updated FLINK-38574:
--------------------------------
    Description: 
It might be possible that from the TM's perspective, the checkpoint 
notification for last checkpoint is executed after the start of the next 
checkpoint, assuming no concurrent checkpoints. If so, this might indicate a 
bug. I'm saying that because this may make the 
`RocksIncrementalSnapshotStrategy` behaves wrong, diverging from the state 
tracking maintained by the `SharedStateRegistry` on the JM side. For example, 
considering a buggy event timeline for `RocksIncrementalSnapshotStrategy`:
 * Checkpoint 1 finished with state handle A for 1.sst.
 * Checkpoint 2 start, based on no checkpoint, so re-upload the 1.sst with 
handle B
 * Received Checkpoint 1's notification of finish.
 * Checkpoint 2 finished. JM thinks the handle A is subsumed.
 * Checkpoint 3 start, based on checkpoint 1, for 1.sst we reuse the handle A.
 * Received Checkpoint 1's subsume, Checkpoint 2's notification of finish.
 * Checkpoint 3 finished. JM received the handle A's placeholder and tell 
'Attempt to reference unknown state', since the handle A is removed when 
checkpoint 2 finished.

  was:
It might be possible that from the TM's perspective, the checkpoint 
notification for last checkpoint is executed after the start of the next 
checkpoint, assuming no concurrent checkpoints? If so, this might indicate a 
bug. I'm saying that because this may make the 
`RocksIncrementalSnapshotStrategy` behaves wrong, diverging from the state 
tracking maintained by the `SharedStateRegistry` on the JM side. For example, 
considering a buggy event timeline for `RocksIncrementalSnapshotStrategy`:
 * Checkpoint 1 finished with state handle A for 1.sst.
 * Checkpoint 2 start, based on no checkpoint, so re-upload the 1.sst with 
handle B
 * Received Checkpoint 1's notification of finish.
 * Checkpoint 2 finished. JM thinks the handle A is subsumed.
 * Checkpoint 3 start, based on checkpoint 1, for 1.sst we reuse the handle A.
 * Received Checkpoint 1's subsume, Checkpoint 2's notification of finish.
 * Checkpoint 3 finished. JM received the handle A's placeholder and tell 
'Attempt to reference unknown state', since the handle A is removed when 
checkpoint 2 finished.


> Avoid reusing re-uploaded sst files when checkpoint notification is delayed
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-38574
>                 URL: https://issues.apache.org/jira/browse/FLINK-38574
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 2.0.1, 1.20.3, 2.1.1
>            Reporter: Zakelly Lan
>            Assignee: Zakelly Lan
>            Priority: Major
>
> It might be possible that from the TM's perspective, the checkpoint 
> notification for last checkpoint is executed after the start of the next 
> checkpoint, assuming no concurrent checkpoints. If so, this might indicate a 
> bug. I'm saying that because this may make the 
> `RocksIncrementalSnapshotStrategy` behaves wrong, diverging from the state 
> tracking maintained by the `SharedStateRegistry` on the JM side. For example, 
> considering a buggy event timeline for `RocksIncrementalSnapshotStrategy`:
>  * Checkpoint 1 finished with state handle A for 1.sst.
>  * Checkpoint 2 start, based on no checkpoint, so re-upload the 1.sst with 
> handle B
>  * Received Checkpoint 1's notification of finish.
>  * Checkpoint 2 finished. JM thinks the handle A is subsumed.
>  * Checkpoint 3 start, based on checkpoint 1, for 1.sst we reuse the handle A.
>  * Received Checkpoint 1's subsume, Checkpoint 2's notification of finish.
>  * Checkpoint 3 finished. JM received the handle A's placeholder and tell 
> 'Attempt to reference unknown state', since the handle A is removed when 
> checkpoint 2 finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-38574) Avoid reusing re-uploaded sst files when checkpoint notification is delayed

Reply via email to