rkhachatryan commented on a change in pull request #12670:
URL: https://github.com/apache/flink/pull/12670#discussion_r441346798



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
##########
@@ -538,51 +542,61 @@ private void 
startTriggeringCheckpoint(CheckpointTriggerRequest request) {
                                                                        
coordinatorsToCheckpoint, pendingCheckpoint, timer),
                                                        timer);
 
-                       FutureUtils.assertNoException(
-                               CompletableFuture.allOf(masterStatesComplete, 
coordinatorCheckpointsComplete)
-                                       .handleAsync(
-                                               (ignored, throwable) -> {
-                                                       final PendingCheckpoint 
checkpoint =
-                                                               
FutureUtils.getWithoutException(pendingCheckpointCompletableFuture);
-
-                                                       
Preconditions.checkState(
-                                                               checkpoint != 
null || throwable != null,
-                                                               "Either the 
pending checkpoint needs to be created or an error must have been occurred.");
-
-                                                       if (throwable != null) {
-                                                               // the 
initialization might not be finished yet
-                                                               if (checkpoint 
== null) {
-                                                                       
onTriggerFailure(request, throwable);
-                                                               } else {
-                                                                       
onTriggerFailure(checkpoint, throwable);
-                                                               }
+                       FutureUtils.waitForAll(asList(masterStatesComplete, 
coordinatorCheckpointsComplete))
+                               .handleAsync(
+                                       (ignored, throwable) -> {
+                                               final PendingCheckpoint 
checkpoint =
+                                                       
FutureUtils.getWithoutException(pendingCheckpointCompletableFuture);
+
+                                               Preconditions.checkState(
+                                                       checkpoint != null || 
throwable != null,
+                                                       "Either the pending 
checkpoint needs to be created or an error must have been occurred.");
+
+                                               if (throwable != null) {
+                                                       // the initialization 
might not be finished yet
+                                                       if (checkpoint == null) 
{
+                                                               
onTriggerFailure(request, throwable);
                                                        } else {
-                                                               if 
(checkpoint.isDiscarded()) {
-                                                                       
onTriggerFailure(
-                                                                               
checkpoint,
-                                                                               
new CheckpointException(
-                                                                               
        CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE,
-                                                                               
        checkpoint.getFailureCause()));
-                                                               } else {
-                                                                       // no 
exception, no discarding, everything is OK
-                                                                       final 
long checkpointId = checkpoint.getCheckpointId();
-                                                                       
snapshotTaskState(
-                                                                               
timestamp,
-                                                                               
checkpointId,
-                                                                               
checkpoint.getCheckpointStorageLocation(),
-                                                                               
request.props,
-                                                                               
executions,
-                                                                               
request.advanceToEndOfTime);
-
-                                                                       
coordinatorsToCheckpoint.forEach((ctx) -> 
ctx.afterSourceBarrierInjection(checkpointId));
-
-                                                                       
onTriggerSuccess();
-                                                               }
+                                                               
onTriggerFailure(checkpoint, throwable);
                                                        }
+                                               } else {
+                                                       if 
(checkpoint.isDiscarded()) {
+                                                               
onTriggerFailure(
+                                                                       
checkpoint,
+                                                                       new 
CheckpointException(
+                                                                               
CheckpointFailureReason.TRIGGER_CHECKPOINT_FAILURE,
+                                                                               
checkpoint.getFailureCause()));
+                                                       } else {
+                                                               // no 
exception, no discarding, everything is OK
+                                                               final long 
checkpointId = checkpoint.getCheckpointId();
+                                                               
snapshotTaskState(
+                                                                       
timestamp,
+                                                                       
checkpointId,
+                                                                       
checkpoint.getCheckpointStorageLocation(),
+                                                                       
request.props,
+                                                                       
executions,
+                                                                       
request.advanceToEndOfTime);
+
+                                                               
coordinatorsToCheckpoint.forEach((ctx) -> 
ctx.afterSourceBarrierInjection(checkpointId));
+
+                                                               
onTriggerSuccess();
+                                                       }
+                                               }
 
-                                                       return null;
-                                               },
-                                               timer));
+                                               return null;
+                                       },
+                                       timer)
+                               .whenComplete((unused, error) -> {
+                                       if (error != null) {
+                                               if (!isShutdown()) {
+                                                       
failureManager.handleJobLevelCheckpointException(new 
CheckpointException(EXCEPTION, error), Optional.empty());
+                                               } else if (error instanceof 
RejectedExecutionException) {

Review comment:
       > But once the CheckpointCoordinator is shut down, these exceptions 
should no longer matter, at least not in terms of correctness of the job. 
   
   So we are talkiing about the case `isShutdown() == true` branch. 
   I agree about job correctness and in this case exceptions are only logged.
   I think this can be helpful during troubleshooting or debugging.
   
   > I don't really understand why this was not a problem before and now we 
want to fine grained filter out exceptions, this does not seem like a straight 
forward solution to me.
   
   We didn't have this code in previous releases. `CheckpointCoordinator` was 
heavily refactored this release (than partially reverted). During the 
development, we did have problem recently and had to guess that there is an NPE.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to