[
https://issues.apache.org/jira/browse/FLINK-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897347#comment-15897347
]
Stephan Ewen commented on FLINK-5962:
-------------------------------------
I have a pretty big change to the {{PendingCheckpoint}} and
{{CheckpointCoordinator}} coming up, which should go in first, lest we
completely redo the timer patch anyways.
I think the fix for this issue is actually very small, it simply means adding
the cancellation timer to the {{PendingCheckpoint}} and cancelling it when
disposing the pending checkpoint.
My change will only go into {{master}}, so creating a patch for the
{{release-1.2}} branch should be fine.
> Cancel checkpoint canceller tasks in CheckpointCoordinator
> ----------------------------------------------------------
>
> Key: FLINK-5962
> URL: https://issues.apache.org/jira/browse/FLINK-5962
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Till Rohrmann
> Priority: Critical
>
> The {{CheckpointCoordinator}} register a canceller task for each running
> checkpoint. The canceller task's responsibility is to cancel a checkpoint if
> it takes too long to complete. We should cancel this task as soon as the
> checkpoint has been completed, because otherwise we will keep many canceller
> tasks around. This can eventually lead to an OOM exception.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)