[
https://issues.apache.org/jira/browse/FLINK-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974912#comment-15974912
]
Stephan Ewen commented on FLINK-6315:
-------------------------------------
[~sjwiesman] Adding a few bits of information here about the behavior of
checkpoints:
- The {{notifyCheckpointComplete}} messages are sent out always, but if there
is a failure between the point where the master "commits" the checkpoint and
before messages arrive, then all TaskManagers may cancel their tasks and no
task will receive that message.
- The same may hold for any {{notifyCheckpointTimeout}} message.
- It may be that some tasks complete their checkpoint, and others fail before
completing theirs. In that case the checkpoint is neither complete, not timed
out, simply failed.
I am wondering if a timeout should be handled differently to any other failure?
> Notify on checkpoint timeout
> -----------------------------
>
> Key: FLINK-6315
> URL: https://issues.apache.org/jira/browse/FLINK-6315
> Project: Flink
> Issue Type: New Feature
> Components: Core
> Reporter: Seth Wiesman
> Assignee: Seth Wiesman
>
> A common use case when writing a custom operator that outputs data to some
> third party location to partially output on checkpoint and then commit on
> notifyCheckpointComplete. If that external system does not gracefully handle
> rollbacks (such as Amazon S3 not allowing consistent delete operations) then
> that data needs to be handled by the next checkpoint.
> The idea is to add a new interface similar to CheckpointListener that
> provides a callback when the CheckpointCoordinator timesout a checkpoint
> {code:java}
> /**
> * This interface must be implemented by functions/operations that want to
> receive
> * a notification if a checkpoint has been {@link
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator}
> */
> public interface CheckpointTimeoutListener {
> /**
> * This method is called as a notification if a distributed checkpoint
> has been timed out.
> *
> * @param checkpointId The ID of the checkpoint that has been timed out.
> * @throws Exception
> */
> void notifyCheckpointTimeout(long checkpointId) throws Exception;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)