akalash commented on a change in pull request #18852:
URL: https://github.com/apache/flink/pull/18852#discussion_r818512895



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##########
@@ -545,14 +545,12 @@ private void 
startTriggeringCheckpoint(CheckpointTriggerRequest request) {
                             .thenApplyAsync(
                                     plan -> {
                                         try {
-                                            CheckpointIdAndStorageLocation
-                                                    
checkpointIdAndStorageLocation =
-                                                            
initializeCheckpoint(
-                                                                    
request.props,
-                                                                    
request.externalSavepointLocation,
-                                                                    
initializeBaseLocations);
-                                            return new Tuple2<>(
-                                                    plan, 
checkpointIdAndStorageLocation);

Review comment:
       > initializing the storage location was moved after creating pending 
checkpoint due to error handling reasons, right? If so, why leaving 
checkpointIdCounter.getAndIncrement() here doesn't cause the same problems?
   
   Yes, it is correct. It will be the same problem for getAndIncrement but 
these changes just try to reduce the probability of this event, and also if we 
don't even have the checkpoint id then there is nothing to send to the 
statistic.
   
   > Wouldn't an alternate solution to this problem be to have somewhere in the 
error handling logic, if we are handling an error for a checkpoint that's 
missing PendingCheckpoint, we lazily/on-demand create one?
   
   I thought about that but I didn't find an easy way to do so. I mean right 
now we lose all information if we fail before the PendingCheckpoint. I also 
think that PendingCheckpoint is too complex an object for such fake creation. I 
am maybe wrong but it looks like it requires a lot of refactoring. Considering 
that this ticket is about logging, statistic but not about PendingCheckpoint 
lifecycle, I decided that we can do more deep refactoring in the other ticket 
if we think that it makes sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to