[
https://issues.apache.org/jira/browse/FLINK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510455#comment-17510455
]
Matthias Pohl commented on FLINK-26742:
---------------------------------------
Because the {{DefaultCompletedCheckpointStore}} uses the CheckpointsCleaner to
finally discard the checkpoints. It's kind of a wrapper around the
{{StateHandleStore}}. The {{StateHandleStore}} stores the serialized version of
the {{CompletedCheckpoint}} in a local file (i.e. the internally used
{{RetrievableStateHandle}}. The discard logic of the {{CompletedCheckpoint}} is
handled outside of the {{StateHandleStore}} through a dedicated object of type
{{Checkpoint.DiscardObject}} that implements {{discard}} and discards the
metadata and any operator state.
> DefaultCompletedCheckpointStore.shutdown does not clean the checkpoints
> atomically
> ----------------------------------------------------------------------------------
>
> Key: FLINK-26742
> URL: https://issues.apache.org/jira/browse/FLINK-26742
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Matthias Pohl
> Priority: Critical
>
> The {{DefaultCompletedCheckpointStore.shutdown}} removes the Checkpoint entry
> from the {{StateHandleStore}} and runs the actual cleanup of the checkpoint
> after it got removed. That means that the data is lost if there's an error
> while discarding the {{CompletedCheckpoint}} which, as a consequence, doesn't
> get picked up anymore during retry.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)