zhuzhurk commented on a change in pull request #9663: [WIP][FLINK-12433][runtime] Implement DefaultScheduler stub URL: https://github.com/apache/flink/pull/9663#discussion_r326892524
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java ########## @@ -1129,6 +1157,9 @@ void failGlobalIfExecutionIsStillRunning(Throwable cause, ExecutionAttemptID fai * @param t The exception that caused the failure. */ public void failGlobal(Throwable t) { + if (!isLegacyScheduling()) { + ExceptionUtils.rethrow(t); Review comment: Re-throwing this error may not work in some cases. `failGlobal` is used as a safety net to recover the job from unexpected inconsistent state. And in my mind this method is not intended to throw any exceptions and many invocations of it do not handle exceptions from it. One problem I can identify is that the `CheckpointFailureManager` would break with this change as it depends on failGlobal. Another possible issue is that all the errors happening in execution updates from TM previously lead to `failGlobal` to recover. Now the errors will cause the RPC to return an error response. Reworking all the possible (with DefaultScheduler) entrances to `failGlobal` still requires a way to handle the errors. Maybe we need to implement a `failGlobal` mechanism that works for DefaultScheduler. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services