yunfengzhou-hub commented on code in PR #20454:
URL: https://github.com/apache/flink/pull/20454#discussion_r939861202


##########
flink-runtime/src/main/java/org/apache/flink/runtime/operators/coordination/OperatorCoordinatorHolder.java:
##########
@@ -298,6 +322,86 @@ public void resetToCheckpoint(long checkpointId, @Nullable 
byte[] checkpointData
         coordinator.resetToCheckpoint(checkpointId, checkpointData);
     }
 
+    private void closeGatewayAndCheckpointCoordinator(
+            long checkpointId, CompletableFuture<byte[]> result) {
+        mainThreadExecutor.assertRunningInMainThread();
+
+        latestAttemptedCheckpointId = checkpointId;
+
+        acknowledgeCloseGatewayFutureMap.clear();
+        for (int subtask : subtaskGatewayMap.keySet()) {
+            acknowledgeCloseGatewayFutureMap.put(subtask, new 
CompletableFuture<>());
+            subtaskGatewayMap
+                    .get(subtask)
+                    .sendEventWithCallBackOnCompletion(
+                            new CloseGatewayEvent(checkpointId),
+                            (success, failure) -> {
+                                if (failure != null) {
+                                    // The close gateway event failed to reach 
the subtask for some
+                                    // reason. For example, the subtask has 
finished. In this case
+                                    // it is guaranteed that the coordinator 
won't receive more
+                                    // events from this subtask before the 
current checkpoint
+                                    // finishes, which is equivalent to 
receiving ACK from this
+                                    // subtask.
+                                    
completeAcknowledgeCloseGatewayFuture(subtask, checkpointId);
+
+                                    if (!(failure instanceof 
RejectedExecutionException

Review Comment:
   As the infrastructure code introduced in this PR would send close gateway 
events to all of a coordinator's subtasks for better simplicity on the API 
changes, regardless of whether a subtask has finished or not, it would be 
normal for the coordinator to receive exceptions related to tasks not in 
running state. Thus we should explicitly catch this exception here. As the 
subtasks are already not running when close gateway events are sent, there is 
no need to trigger a failover on that subtask.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to