metaswirl commented on a change in pull request #18689:
URL: https://github.com/apache/flink/pull/18689#discussion_r805592737



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/StateWithExecutionGraph.java
##########
@@ -306,22 +331,88 @@ void deliverOperatorEventToCoordinator(
                 operatorId, request);
     }
 
+    /** Transition to different state when failure occurs. Stays in the same 
state by default. */
+    abstract void onFailure(Throwable cause);
+
+    /**
+     * Transition to different state when the execution graph reaches a 
globally terminal state.
+     *
+     * @param globallyTerminalState globally terminal state which the 
execution graph reached
+     */
+    abstract void onGloballyTerminalState(JobStatus globallyTerminalState);
+
+    @Override
+    public void handleGlobalFailure(Throwable cause) {
+        failureCollection.add(new GlobalFailure(cause));
+        onFailure(cause);
+    }
+
     /**
      * Updates the execution graph with the given task execution state 
transition.
      *
      * @param taskExecutionStateTransition taskExecutionStateTransition to 
update the ExecutionGraph
      *     with
      * @return {@code true} if the update was successful; otherwise {@code 
false}
      */
-    abstract boolean updateTaskExecutionState(
-            TaskExecutionStateTransition taskExecutionStateTransition);
+    boolean updateTaskExecutionState(TaskExecutionStateTransition 
taskExecutionStateTransition) {

Review comment:
       This method looks complex, because it handles so many edge cases.
   
   1. L366: `updateState` with state `FAILED` de-registers the `Execution`. So, 
we need to collect the ExecutionVertexID before hand.
   2. L366: `updateTaskExecutionState` can be called multiple times with state 
`FAILED`. After the first call, the `Execution` is already de-registered. So, 
we need to use an optional here for the id. 
   3. L376: The `updateState` call will return false if no Execution is 
present. Hence, in line 376, the id should always be available. Anything else 
would be an unexpected state.
   4. L379: If the state of the `Execution` is `CANCELLING` (and some others) 
before the `updateState` call, then the failure will not be stored on the 
Execution. We currently ignore these failures.
   
   For now, I would propose to keep the code as it is and untangle these edge 
cases afterwards.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to