[GitHub] [flink] akalash commented on a change in pull request #18845: [FLINK-25958][runtime] Report completed checkpoint to statistic only after the completed checkpoint is added to the store.

GitBox Mon, 21 Feb 2022 09:04:29 -0800


akalash commented on a change in pull request #18845:
URL: https://github.com/apache/flink/pull/18845#discussion_r811311861




##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##########
@@ -1385,6 +1385,7 @@ private CompletedCheckpoint 
addCompletedCheckpointToStoreAndSubsumeOldest(
                 
checkpointsCleaner.cleanCheckpointOnFailedStoring(completedCheckpoint, 
executor);
             }
 
+            reportFailedCheckpoint(completedCheckpoint, exception);

Review comment:
       I agree that it is better to keep both 'success' and 'fail' reporting in 
one place. It is why I wanted to move it to `completePendingCheckpoint` but it 
is not possible since `finalizeCheckpoint` can be called independently so 
'fail' reporting should be inside of `finalizeCheckpoint`. So right now we 
report about the fail from `finalizeCheckpoint` and 
`addCompletedCheckpointToStoreAndSubsumeOldest` and also we report about 
'success' from `completePendingCheckpoint`. I don't really like it but I don't 
think that moving 'succes' reporting to 
`addCompletedCheckpointToStoreAndSubsumeOldest` somehow improve the situation 
since the reporting should be the last step in the finalization of the 
checkpoint but moving this step into the method which is the last for now but 
can be changed in future seems not so promising

##########
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorTest.java
##########
@@ -2083,6 +2094,12 @@ public void testTriggerAndConfirmSimpleSavepoint() 
throws Exception {
         assertEquals(pending.getCheckpointId(), success.getCheckpointID());
         assertEquals(2, success.getOperatorStates().size());
 
+        AbstractCheckpointStats actualStats =
+                
statsTracker.createSnapshot().getHistory().getCheckpointById(checkpointId);
+
+        assertEquals(checkpointId, actualStats.getCheckpointId());
+        assertEquals(CheckpointStatsStatus.COMPLETED, actualStats.getStatus());
+

Review comment:
       No, this test doesn't fail in master. I just notice that we have no test 
which checks the success scenario. So this extra condition just checks that we 
switch status to `COMPLETED` when the checkpoint is successful.
   
   Yes, the next commit adds the actual regression test which fails with the 
current master. Technically, we can add that test to this commit but instead of 
`FAILED` status we should check for NOT `COMPLETED` or `IN_PROGRESS` status 
since this commit has one more bug which is resolved in next commit.(but I 
decided to add the correct test at once to next commit)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] akalash commented on a change in pull request #18845: [FLINK-25958][runtime] Report completed checkpoint to statistic only after the completed checkpoint is added to the store.

Reply via email to