[ https://issues.apache.org/jira/browse/FLINK-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974912#comment-15974912 ]
Stephan Ewen commented on FLINK-6315: ------------------------------------- [~sjwiesman] Adding a few bits of information here about the behavior of checkpoints: - The {{notifyCheckpointComplete}} messages are sent out always, but if there is a failure between the point where the master "commits" the checkpoint and before messages arrive, then all TaskManagers may cancel their tasks and no task will receive that message. - The same may hold for any {{notifyCheckpointTimeout}} message. - It may be that some tasks complete their checkpoint, and others fail before completing theirs. In that case the checkpoint is neither complete, not timed out, simply failed. I am wondering if a timeout should be handled differently to any other failure? > Notify on checkpoint timeout > ----------------------------- > > Key: FLINK-6315 > URL: https://issues.apache.org/jira/browse/FLINK-6315 > Project: Flink > Issue Type: New Feature > Components: Core > Reporter: Seth Wiesman > Assignee: Seth Wiesman > > A common use case when writing a custom operator that outputs data to some > third party location to partially output on checkpoint and then commit on > notifyCheckpointComplete. If that external system does not gracefully handle > rollbacks (such as Amazon S3 not allowing consistent delete operations) then > that data needs to be handled by the next checkpoint. > The idea is to add a new interface similar to CheckpointListener that > provides a callback when the CheckpointCoordinator timesout a checkpoint > {code:java} > /** > * This interface must be implemented by functions/operations that want to > receive > * a notification if a checkpoint has been {@link > org.apache.flink.runtime.checkpoint.CheckpointCoordinator} > */ > public interface CheckpointTimeoutListener { > /** > * This method is called as a notification if a distributed checkpoint > has been timed out. > * > * @param checkpointId The ID of the checkpoint that has been timed out. > * @throws Exception > */ > void notifyCheckpointTimeout(long checkpointId) throws Exception; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)