Matthias Pohl created FLINK-26606:
-------------------------------------

             Summary: CompletedCheckpoints that failed to be discarded are not 
stored in the CompletedCheckpointStore
                 Key: FLINK-26606
                 URL: https://issues.apache.org/jira/browse/FLINK-26606
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.15.0
            Reporter: Matthias Pohl


We introduced a repeatable per-job cleanup after the job reached a 
globally-terminated state. It also tries to clean up the 
{{CompletedCheckpointStore}}. But we missed one code path where 
{{CompletedCheckpoints}} are tried to be discarded in the 
{{CheckpointsCleaner}}. The {{CompletedCheckpointStore}} does not hold any 
references to these {{CompletedCheckpoints}} anymore. The shutdown at the end 
is not able to clean these checkpoints up.

We should not remove the {{CompletedCheckpoints}} from the 
{{CompletedCheckpointStore}} if the deletion failed. This would enable us to 
retry deleting these artifacts at the end of the job and consider them in the 
retryable cleanup as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to