[ 
https://issues.apache.org/jira/browse/FLINK-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975203#comment-15975203
 ] 

Stephan Ewen commented on FLINK-6315:
-------------------------------------

I think you are thinking about it the right way. When checkpoint 2 does not 
happen for whatever reason then checkpoint 3 should be in charge of everything 
since the last successful checkpoint.

I see the problem now: When checkpoint 3 starts, you may not yet know whether 
checkpoint 2 is actually going to complete. To make it more tricky, it may 
actually be that checkpoint 2 fails (due to a timeout) after checkpoint 3 
completes.

In the incremental checkpointing code, we have a similar problem. In that case, 
we can only re-reference a diff if it is part of a completed checkpoint. If for 
example checkpoint 2 is not complete when checkpoint 3 is started, then 
checkpoint 3 builds on checkpoint 1, not on checkpoint 2.

[~aljoscha] How is that handled in the regular bucketing sink?

> Notify on checkpoint timeout 
> -----------------------------
>
>                 Key: FLINK-6315
>                 URL: https://issues.apache.org/jira/browse/FLINK-6315
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Seth Wiesman
>            Assignee: Seth Wiesman
>
> A common use case when writing a custom operator that outputs data to some 
> third party location to partially output on checkpoint and then commit on 
> notifyCheckpointComplete. If that external system does not gracefully handle 
> rollbacks (such as Amazon S3 not allowing consistent delete operations) then 
> that data needs to be handled by the next checkpoint. 
> The idea is to add a new interface similar to CheckpointListener that 
> provides a callback when the CheckpointCoordinator timesout a checkpoint
> {code:java}
> /**
>  * This interface must be implemented by functions/operations that want to 
> receive
>  * a notification if a checkpoint has been {@link 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator}
>  */
> public interface CheckpointTimeoutListener {
>       /**
>        * This method is called as a notification if a distributed checkpoint 
> has been timed out.
>        *
>        * @param checkpointId The ID of the checkpoint that has been timed out.
>        * @throws Exception
>        */
>       void notifyCheckpointTimeout(long checkpointId) throws Exception;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to