Matthias created FLINK-22494:
--------------------------------

             Summary: Avoid discarding checkpoints in case of failure
                 Key: FLINK-22494
                 URL: https://issues.apache.org/jira/browse/FLINK-22494
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing, Runtime / Coordination
    Affects Versions: 1.13.0, 1.14.0, 1.12.3
            Reporter: Matthias
             Fix For: 1.14.0, 1.13.1, 1.12.4


Both {{StateHandleStore}} implementations (i.e. {{KubernetesStateHandleStore}} 
and {{ZooKeeperStateHandleStore}}) discard checkpoints if the checkpoint 
metadata wasn't written to the backend. 

This does not cover the cases where the data was actually written to the 
backend but the call failed anyway (e.g. due to network issues). In such a 
case, we might end up having a pointer in the backend pointing to a checkpoint 
that was discarded.

Instead of discarding the checkpoint data in this case, we might want to keep 
it for this specific use case. Otherwise, we might run into Exceptions when 
recovering from the Checkpoint later on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to