[
https://issues.apache.org/jira/browse/FLINK-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926476#comment-15926476
]
ASF GitHub Bot commented on FLINK-5962:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/3548#discussion_r106209461
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PendingCheckpoint.java
---
@@ -427,8 +446,23 @@ public void run() {
discarded = true;
notYetAcknowledgedTasks.clear();
acknowledgedTasks.clear();
+ cancelCanceller();
+ }
+ }
+ }
+
+ private void cancelCanceller() {
+ try {
+ final ScheduledFuture<?> canceller =
this.cancellerHandle;
+ if (canceller != null) {
+ this.cancellerHandle = null;
+ canceller.cancel(false);
}
}
+ catch (Exception e) {
+ // this code should not throw exceptions
+ LOG.warn("Error while cancelling checkpoint timeout
task");
--- End diff --
`e` is swallowed.
> Cancel checkpoint canceller tasks in CheckpointCoordinator
> ----------------------------------------------------------
>
> Key: FLINK-5962
> URL: https://issues.apache.org/jira/browse/FLINK-5962
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Till Rohrmann
> Assignee: Stephan Ewen
> Priority: Critical
>
> The {{CheckpointCoordinator}} register a canceller task for each running
> checkpoint. The canceller task's responsibility is to cancel a checkpoint if
> it takes too long to complete. We should cancel this task as soon as the
> checkpoint has been completed, because otherwise we will keep many canceller
> tasks around. This can eventually lead to an OOM exception.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)