Xiaogang Shi created FLINK-6027:
-----------------------------------

             Summary: Ignore the exception thrown by the subsuming of old 
completed checkpoints
                 Key: FLINK-6027
                 URL: https://issues.apache.org/jira/browse/FLINK-6027
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
            Reporter: Xiaogang Shi
            Assignee: Xiaogang Shi


When a checkpoint is added into the {{CompletedCheckpointStore}} via the method 
{{addCheckpoint()}}, the oldest checkpoints will be removed from the store if 
the number of stored checkpoints exceeds the given limit. The subsuming of old 
checkpoints may fail and make {{addCheckpoint()}} throw exceptions which are 
caught by {{CheckpointCoordinator}}. Finally, the states in the new checkpoint 
will be deleted by {{CheckpointCoordinator}}. Because the new checkpoint is 
still in the store, we may recover the job from the new checkpoint. But the 
recovery will fail as the states of the checkpoint are all deleted.

We should ignore the exceptions thrown by the subsuming of old checkpoints 
because we can always recover from the new checkpoint when successfully adding 
it into the store. The ignorance may produce some dirty data, but it's 
acceptable because they can be cleaned with the cleanup hook introduced in the 
near future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to