[ 
https://issues.apache.org/jira/browse/FLINK-26606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-26606:
----------------------------------
    Priority: Major  (was: Critical)

> CompletedCheckpoints that failed to be discarded are not stored in the 
> CompletedCheckpointStore
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-26606
>                 URL: https://issues.apache.org/jira/browse/FLINK-26606
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Priority: Major
>
> We introduced a repeatable per-job cleanup after the job reached a 
> globally-terminated state. It also tries to clean up the 
> {{CompletedCheckpointStore}}. But we missed one code path where 
> {{CompletedCheckpoints}} are tried to be discarded in the 
> {{CheckpointsCleaner}}. The {{CompletedCheckpointStore}} does not hold any 
> references to these {{CompletedCheckpoints}} anymore. The shutdown at the end 
> is not able to clean these checkpoints up.
> We should not remove the {{CompletedCheckpoints}} from the 
> {{CompletedCheckpointStore}} if the deletion failed. This would enable us to 
> retry deleting these artifacts at the end of the job and consider them in the 
> retryable cleanup as well.
> The documentation was updated to cover this issue. Fixing this issue should 
> also include removing the corresponding paragraph from the documentation (see 
> [related flink-docs PR|https://github.com/apache/flink/pull/19058]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to