[ 
https://issues.apache.org/jira/browse/FLINK-22494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-22494.
---------------------------------
    Resolution: Fixed

Fixed via

1.14.0:
9d2e2d980e
cc59ad5e62
417cf78fd0
e632591623
fa0d1dc3d5

1.13.1:
a24b0d86d7
6ce9cc900c
b5c49ef484
a6c13e79b6
13bc663802

1.12.5:
a53a1f3e99
95bd043f0a
b81887e647
5aec1bb949
6b53f8b0e0

> Avoid discarding checkpoints in case of failure
> -----------------------------------------------
>
>                 Key: FLINK-22494
>                 URL: https://issues.apache.org/jira/browse/FLINK-22494
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing, Runtime / Coordination
>    Affects Versions: 1.13.0, 1.14.0, 1.12.3
>            Reporter: Matthias
>            Assignee: Matthias
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.14.0, 1.13.1, 1.12.5
>
>
> Both {{StateHandleStore}} implementations (i.e. 
> [KubernetesStateHandleStore:157|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/highavailability/KubernetesStateHandleStore.java#L157]
>  and 
> [ZooKeeperStateHandleStore:170|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/zookeeper/ZooKeeperStateHandleStore.java#L170])
>  discard checkpoints if the checkpoint metadata wasn't written to the 
> backend. 
> This does not cover the cases where the data was actually written to the 
> backend but the call failed anyway (e.g. due to network issues). In such a 
> case, we might end up having a pointer in the backend pointing to a 
> checkpoint that was discarded.
> Instead of discarding the checkpoint data in this case, we might want to keep 
> it for this specific use case. Otherwise, we might run into Exceptions when 
> recovering from the Checkpoint later on. We might want to add a warning to 
> the user pointing to the possibly orphaned checkpoint data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to