zhuzhurk commented on code in PR #21970:
URL: https://github.com/apache/flink/pull/21970#discussion_r1113780430


##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java:
##########
@@ -377,6 +377,10 @@ private void restartTasks(
         final Set<ExecutionVertexID> verticesToRestart =
                 
executionVertexVersioner.getUnmodifiedExecutionVertices(executionVertexVersions);
 
+        if (verticesToRestart.isEmpty()) {
+            return;

Review Comment:
   A global failover can be superseded by a regional failover, regarding the 
tasks to restart.
   Here's an example: Here's a job consists of one only pipelined region. A 
global failure happens first(caused by the OperatorCoordinator) and need to 
restart all the tasks. It also needs 
`OperatorCoordinatorHolder#resetToCheckpoint()` to be invoked to recover from 
an inconsistent status. However, a task happens later but almost at the same 
time, which needs to restart all the tasks. Therefore, the `verticesToRestart` 
would be empty when `restartTasks(...)` is invoked for the global failure. And 
`OperatorCoordinatorHolder#resetToCheckpoint()` will not be invoked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to