yanghua commented on a change in pull request #8322: [FLINK-12364] Introduce a CheckpointFailureManager to centralized manage checkpoint failure URL: https://github.com/apache/flink/pull/8322#discussion_r284102152
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java ########## @@ -435,6 +439,12 @@ public boolean triggerCheckpoint(long timestamp, boolean isPeriodic) { triggerCheckpoint(timestamp, checkpointProperties, null, isPeriodic, false); return true; } catch (CheckpointException e) { + try { + long latestGeneratedCheckpointId = getCheckpointIdCounter().getAndIncrement(); Review comment: > What I meant is removing the `try-catch(CheckpointException)`around the `triggerCheckpoint` call completely and exchange the `throw`s inside the method by direct calls to the failure handler. I have accepted all the suggestion, except this one, because when I was refactoring, I found the complexity may exceed our imagination, some obstacle: 1) ``` public boolean triggerCheckpoint(long timestamp, boolean isPeriodic) { ``` This method returns a boolean value, so it needs a binary state, when we catch `CheckpointException` we return false and when we get `PendingCheckpoint` we return true. If the real triggerCheckpoint method does not throw `CheckpointException` how two distinguish binary state? 2)The triggerCheckpoint's boolean return value has been used in so many test case to judge whether a checkpoint is successful or not. 3) The `triggerSavepointInternal` method need to catch the `CheckpointException` to return a completed exceptionally for the future object, that means `triggerSavepointInternal` also needs a binary state for `CompletableFuture<CompletedCheckpoint>` 4) I have implemented `CheckpointIDCounter#get`, after providing this function. I think report failure centralized would be better than scattered code, just like we provide a unified `PendingCheckpoint#abort` before. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services