[ 
https://issues.apache.org/jira/browse/FLINK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510455#comment-17510455
 ] 

Matthias Pohl commented on FLINK-26742:
---------------------------------------

Because the {{DefaultCompletedCheckpointStore}} uses the CheckpointsCleaner to 
finally discard the checkpoints. It's kind of a wrapper around the 
{{StateHandleStore}}. The {{StateHandleStore}} stores the serialized version of 
the {{CompletedCheckpoint}} in a local file (i.e. the internally used 
{{RetrievableStateHandle}}. The discard logic of the {{CompletedCheckpoint}} is 
handled outside of the {{StateHandleStore}} through a dedicated object of type 
{{Checkpoint.DiscardObject}} that implements {{discard}} and discards the 
metadata and any operator state.

> DefaultCompletedCheckpointStore.shutdown does not clean the checkpoints 
> atomically
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-26742
>                 URL: https://issues.apache.org/jira/browse/FLINK-26742
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Priority: Critical
>
> The {{DefaultCompletedCheckpointStore.shutdown}} removes the Checkpoint entry 
> from the {{StateHandleStore}} and runs the actual cleanup of the checkpoint 
> after it got removed. That means that the data is lost if there's an error 
> while discarding the {{CompletedCheckpoint}} which, as a consequence, doesn't 
> get picked up anymore during retry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to