[
https://issues.apache.org/jira/browse/FLINK-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708912#comment-15708912
]
ASF GitHub Bot commented on FLINK-5158:
---------------------------------------
Github user uce commented on a diff in the pull request:
https://github.com/apache/flink/pull/2873#discussion_r90258544
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
---
@@ -731,46 +700,100 @@ public boolean
receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws E
discardState(message.getState());
}
+
+ return true;
}
else if (checkpoint != null) {
// this should not happen
throw new IllegalStateException(
"Received message for discarded
but non-removed checkpoint " + checkpointId);
}
else {
+ boolean wasPendingCheckpoint;
+
// message is for an unknown checkpoint, or
comes too late (checkpoint disposed)
if
(recentPendingCheckpoints.contains(checkpointId)) {
- isPendingCheckpoint = true;
+ wasPendingCheckpoint = true;
LOG.warn("Received late message for now
expired checkpoint attempt {}.", checkpointId);
}
else {
LOG.debug("Received message for an
unknown checkpoint {}.", checkpointId);
- isPendingCheckpoint = false;
+ wasPendingCheckpoint = false;
}
// try to discard the state so that we don't
have lingering state lying around
discardState(message.getState());
+
+ return wasPendingCheckpoint;
+ }
+ }
+ }
+
+ private void completePendingCheckpoint(PendingCheckpoint
pendingCheckpoint) throws CheckpointException {
--- End diff --
Missing JavaDocs, maybe add that this needs to be called in checkpoint lock
scope
> Handle ZooKeeperCompletedCheckpointStore exceptions in CheckpointCoordinator
> ----------------------------------------------------------------------------
>
> Key: FLINK-5158
> URL: https://issues.apache.org/jira/browse/FLINK-5158
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Fix For: 1.2.0, 1.1.4
>
>
> The checkpoint coordinator does not properly handle exceptions when trying to
> store completed checkpoints. As a result, completed checkpoints are not
> properly cleaned up and even worse, the {{CheckpointCoordinator}} might get
> stuck stopping triggering checkpoints.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)