zhuzhurk commented on a change in pull request #9663: 
[WIP][FLINK-12433][runtime] Implement DefaultScheduler stub
URL: https://github.com/apache/flink/pull/9663#discussion_r326892524
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
 ##########
 @@ -1129,6 +1157,9 @@ void failGlobalIfExecutionIsStillRunning(Throwable 
cause, ExecutionAttemptID fai
         * @param t The exception that caused the failure.
         */
        public void failGlobal(Throwable t) {
+               if (!isLegacyScheduling()) {
+                       ExceptionUtils.rethrow(t);
 
 Review comment:
   Re-throwing this error may not work in some cases.
   `failGlobal` is used as a safety net to recover the job from unexpected 
inconsistent state. And in my mind this method is not intended to throw any 
exceptions and many invocations of it do not handle exceptions from it.
   
   One problem I can identify is that the `CheckpointFailureManager` would 
break with this change as it depends on failGlobal.
   Another possible issue is that all the errors happening in execution updates 
from TM previously lead to `failGlobal` to recover. Now the errors will cause 
the RPC to return an error response.
   
   Reworking all the possible (with DefaultScheduler) entrances to `failGlobal` 
still requires a way to handle the errors.
   Maybe we need to implement a `failGlobal` mechanism that works for 
DefaultScheduler.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to