[ 
https://issues.apache.org/jira/browse/FLINK-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907107#comment-15907107
 ] 

ASF GitHub Bot commented on FLINK-6027:
---------------------------------------

GitHub user shixiaogang opened a pull request:

    https://github.com/apache/flink/pull/3521

    [FLINK-6027][checkpoint] Ignore the exception thrown by the subsuming of 
completed checkppoints

    The exception thrown during the subsuming of old checkpoints now will be 
ignored. Now, `CompletedCheckpointStore#addCheckpoint` will throw exceptions 
only when the completed checkpoint is not written in the store. In such cases, 
the coordinator is safe to delete the states in the checkpoint because we are 
impossible to recover from the checkpoint.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shixiaogang/flink flink-6027

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3521.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3521
    
----
commit e89e947e74693ef1d5fdcfaebdc1818b138f2fd1
Author: xiaogang.sxg <[email protected]>
Date:   2017-03-13T09:03:42Z

    Ignore the exception thrown by the subsuming of completed checkppoints

commit 9ba89c42ed4751c68cf9520032e40dc6e857212c
Author: xiaogang.sxg <[email protected]>
Date:   2017-03-13T10:04:30Z

    Change the log level to WARNING

----


> Ignore the exception thrown by the subsuming of old completed checkpoints
> -------------------------------------------------------------------------
>
>                 Key: FLINK-6027
>                 URL: https://issues.apache.org/jira/browse/FLINK-6027
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Xiaogang Shi
>            Assignee: Xiaogang Shi
>
> When a checkpoint is added into the {{CompletedCheckpointStore}} via the 
> method {{addCheckpoint()}}, the oldest checkpoints will be removed from the 
> store if the number of stored checkpoints exceeds the given limit. The 
> subsuming of old checkpoints may fail and make {{addCheckpoint()}} throw 
> exceptions which are caught by {{CheckpointCoordinator}}. Finally, the states 
> in the new checkpoint will be deleted by {{CheckpointCoordinator}}. Because 
> the new checkpoint is still in the store, we may recover the job from the new 
> checkpoint. But the recovery will fail as the states of the checkpoint are 
> all deleted.
> We should ignore the exceptions thrown by the subsuming of old checkpoints 
> because we can always recover from the new checkpoint when successfully 
> adding it into the store. The ignorance may produce some dirty data, but it's 
> acceptable because they can be cleaned with the cleanup hook introduced in 
> the near future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to