Robert Metzger created FLINK-2079:
-------------------------------------
Summary: Add watcher to YARN TM containers to detect stopped actor
system
Key: FLINK-2079
URL: https://issues.apache.org/jira/browse/FLINK-2079
Project: Flink
Issue Type: Improvement
Components: TaskManager, YARN Client
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
I experienced an OutOfMemoryError (caused by the usercode) while running Flink
on YARN.
It seems that the TaskManager is correctly detecting the fatal error, however
the JVM is not shutting down, so YARN won't bring up new containers.
Therefore, I want to start a thread on the YarnTaskManagerRunner which
periodically (every 30 seconds) checks whether the actor system is still
running. If not, its doing a System.exit(1).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)