Matthias Pohl created FLINK-26606: ------------------------------------- Summary: CompletedCheckpoints that failed to be discarded are not stored in the CompletedCheckpointStore Key: FLINK-26606 URL: https://issues.apache.org/jira/browse/FLINK-26606 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Matthias Pohl
We introduced a repeatable per-job cleanup after the job reached a globally-terminated state. It also tries to clean up the {{CompletedCheckpointStore}}. But we missed one code path where {{CompletedCheckpoints}} are tried to be discarded in the {{CheckpointsCleaner}}. The {{CompletedCheckpointStore}} does not hold any references to these {{CompletedCheckpoints}} anymore. The shutdown at the end is not able to clean these checkpoints up. We should not remove the {{CompletedCheckpoints}} from the {{CompletedCheckpointStore}} if the deletion failed. This would enable us to retry deleting these artifacts at the end of the job and consider them in the retryable cleanup as well. -- This message was sent by Atlassian Jira (v8.20.1#820001)