[GitHub] [flink] yanghua commented on a change in pull request #8322: [FLINK-12364] Introduce a CheckpointFailureManager to centralized manage checkpoint failure

GitBox Tue, 14 May 2019 23:26:30 -0700

yanghua commented on a change in pull request #8322: [FLINK-12364] Introduce a 
CheckpointFailureManager to centralized manage checkpoint failure
URL: https://github.com/apache/flink/pull/8322#discussion_r284102152


 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ##########
 @@ -435,6 +439,12 @@ public boolean triggerCheckpoint(long timestamp, boolean 
isPeriodic) {
                        triggerCheckpoint(timestamp, checkpointProperties, 
null, isPeriodic, false);
                        return true;
                } catch (CheckpointException e) {
+                       try {
+                               long latestGeneratedCheckpointId = 
getCheckpointIdCounter().getAndIncrement();
 
 Review comment:
   > What I meant is removing the `try-catch(CheckpointException)`around the 
`triggerCheckpoint` call completely and exchange the `throw`s inside the method 
by direct calls to the failure handler.
   
   I have accepted all the suggestion, except this one, because when I was 
refactoring, I found the complexity may exceed our imagination, some obstacle:
   
   1)   
   ```
   public boolean triggerCheckpoint(long timestamp, boolean isPeriodic) {
   ```
   
   This method returns a boolean value, so it needs a binary state, when we 
catch `CheckpointException` we return false and when we get `PendingCheckpoint` 
we return true. If the real triggerCheckpoint method does not throw 
`CheckpointException` how two distinguish binary state?
   
   2) The triggerCheckpoint's boolean return value has been used in so many 
test case to judge whether a checkpoint is successful or not.
   
   3) The `triggerSavepointInternal` method need to catch the 
`CheckpointException` to return a completed exceptionally for the future 
object, that means `triggerSavepointInternal` also needs a binary state for 
`CompletableFuture<CompletedCheckpoint>`
   
   4) I have implemented `CheckpointIDCounter#get`, after providing this 
function. I think report failure centralized would be better than scattered 
code, just like we provide a unified `PendingCheckpoint#abort` before.
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] yanghua commented on a change in pull request #8322: [FLINK-12364] Introduce a CheckpointFailureManager to centralized manage checkpoint failure

Reply via email to