[ 
https://issues.apache.org/jira/browse/FLINK-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974912#comment-15974912
 ] 

Stephan Ewen commented on FLINK-6315:
-------------------------------------

[~sjwiesman] Adding a few bits of information here about the behavior of 
checkpoints:

  - The {{notifyCheckpointComplete}} messages are sent out always, but if there 
is a failure between the point where the master "commits" the checkpoint and 
before messages arrive, then all TaskManagers may cancel their tasks and no 
task will receive that message.
  - The same may hold for any {{notifyCheckpointTimeout}} message.
  - It may be that some tasks complete their checkpoint, and others fail before 
completing theirs. In that case the checkpoint is neither complete, not timed 
out, simply failed.

I am wondering if a timeout should be handled differently to any other failure?


> Notify on checkpoint timeout 
> -----------------------------
>
>                 Key: FLINK-6315
>                 URL: https://issues.apache.org/jira/browse/FLINK-6315
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Seth Wiesman
>            Assignee: Seth Wiesman
>
> A common use case when writing a custom operator that outputs data to some 
> third party location to partially output on checkpoint and then commit on 
> notifyCheckpointComplete. If that external system does not gracefully handle 
> rollbacks (such as Amazon S3 not allowing consistent delete operations) then 
> that data needs to be handled by the next checkpoint. 
> The idea is to add a new interface similar to CheckpointListener that 
> provides a callback when the CheckpointCoordinator timesout a checkpoint
> {code:java}
> /**
>  * This interface must be implemented by functions/operations that want to 
> receive
>  * a notification if a checkpoint has been {@link 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator}
>  */
> public interface CheckpointTimeoutListener {
>       /**
>        * This method is called as a notification if a distributed checkpoint 
> has been timed out.
>        *
>        * @param checkpointId The ID of the checkpoint that has been timed out.
>        * @throws Exception
>        */
>       void notifyCheckpointTimeout(long checkpointId) throws Exception;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to