[ https://issues.apache.org/jira/browse/FLINK-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-26606: ---------------------------------- Priority: Major (was: Critical) > CompletedCheckpoints that failed to be discarded are not stored in the > CompletedCheckpointStore > ----------------------------------------------------------------------------------------------- > > Key: FLINK-26606 > URL: https://issues.apache.org/jira/browse/FLINK-26606 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Coordination > Affects Versions: 1.15.0 > Reporter: Matthias Pohl > Priority: Major > > We introduced a repeatable per-job cleanup after the job reached a > globally-terminated state. It also tries to clean up the > {{CompletedCheckpointStore}}. But we missed one code path where > {{CompletedCheckpoints}} are tried to be discarded in the > {{CheckpointsCleaner}}. The {{CompletedCheckpointStore}} does not hold any > references to these {{CompletedCheckpoints}} anymore. The shutdown at the end > is not able to clean these checkpoints up. > We should not remove the {{CompletedCheckpoints}} from the > {{CompletedCheckpointStore}} if the deletion failed. This would enable us to > retry deleting these artifacts at the end of the job and consider them in the > retryable cleanup as well. > The documentation was updated to cover this issue. Fixing this issue should > also include removing the corresponding paragraph from the documentation (see > [related flink-docs PR|https://github.com/apache/flink/pull/19058]). -- This message was sent by Atlassian Jira (v8.20.10#820010)