Till Rohrmann created FLINK-9456:
------------------------------------
Summary: Let ResourceManager notify JobManager about failed/killed
TaskManagers
Key: FLINK-9456
URL: https://issues.apache.org/jira/browse/FLINK-9456
Project: Flink
Issue Type: Improvement
Components: Distributed Coordination
Affects Versions: 1.5.0
Reporter: Till Rohrmann
Fix For: 1.6.0, 1.5.1
Often, the {{ResourceManager}} learns faster about TaskManager
failures/killings because it directly communicates with the underlying resource
management framework. Instead of only relying on the {{JobManager}}'s heartbeat
to figure out that a {{TaskManager}} has died, we should additionally send a
signal from the {{ResourceManager}} to the {{JobManager}} if a {{TaskManager}}
has died. That way, we can react faster to {{TaskManager}} failures and recover
our running job/s.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)