XComp commented on a change in pull request #14798: URL: https://github.com/apache/flink/pull/14798#discussion_r569986337
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/SchedulerBase.java ########## @@ -522,6 +527,7 @@ protected ComponentMainThreadExecutor getMainThreadExecutor() { protected void failJob(Throwable cause) { incrementVersionsOfAllVertices(); executionGraph.failJob(cause); + getTerminationFuture().thenRun(() -> archiveGlobalFailure(cause)); Review comment: Here, I don't understand fully: You're referring to the case where a failure happens, the user cancels the job while the failure handling is done and the `failJob` method might be called while being in a `CANCELED`/`CANCELLING` state?For this case, I would still think that we should archive the exception because the users intervention happened after the exception happened. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org