[
https://issues.apache.org/jira/browse/FLINK-14232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary Yao resolved FLINK-14232.
------------------------------
Resolution: Fixed
1.10: b0e5464beb604f5a6ed3fb9653b32e5c10de7704
> Support global failure handling for DefaultScheduler (SchedulerNG)
> ------------------------------------------------------------------
>
> Key: FLINK-14232
> URL: https://issues.apache.org/jira/browse/FLINK-14232
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.10.0
> Reporter: Zhu Zhu
> Assignee: Zhu Zhu
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.10.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Global failure handling(full restarts) is widely used in ExecutionGraph
> components and even other components to recover the job from an inconsistent
> state.
> We need to support it for DefaultScheduler to not break the safety net. More
> details see [here|https://github.com/apache/flink/pull/9663/files#r326892524].
> There can be follow ups of this task to replace usages of full restarts with
> JVM termination, in cases that are considered as bugs/unexpected to happen.
> Implementation plan:
> 1. Add {{getGlobalFailureHandlingResult(Throwable)}} in
> {{ExecutionFailureHandler}}
> 2. Add an interface {{handleGlobalFailure(Throwable)}} in {{SchedulerNG}} and
> implement it in {{DefaultScheduler}}
> 3. Add an interface {{notifyGlobalFailure(Throwable)}} in
> {{InternalTaskFailuresListener}} and rework the implementations to use
> {{SchedulerNG#handleGlobalFailure}}
> 4. Rework {{ExecutionGraph#failGlobal}} to invoke
> {{InternalTaskFailuresListener#notifyGlobalFailure}} for ng scheduler
--
This message was sent by Atlassian Jira
(v8.3.4#803005)