[
https://issues.apache.org/jira/browse/FLINK-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544192#comment-16544192
]
ASF GitHub Bot commented on FLINK-5232:
---------------------------------------
Github user yanghua commented on the issue:
https://github.com/apache/flink/pull/6334
hi @tillrohrmann I tried to fix this issue based on your suggestion in the
jira. But there seems a little question, I want to consult you.
The question is about the ActorSystem, you suggested add the uncaught
exception handler for the `ActorSystem`. To do this, we should extend the
`ActorSystemImpl` (the default implementation). This class's constructor has
[many
parameters](https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/actor/ActorSystem.scala#L651).
I am not very familiar with it. So I tried fill the ["default"
params](https://github.com/yanghua/flink/blob/27dec5d60d2e799aeea66013b3da904cec137408/flink-runtime/src/main/scala/org/apache/flink/runtime/akka/RobustActorSystem.scala#L33).
I ran the test case, they always failed because of the fifth parameter.
So the question is the `ActorSystemImpl` is marked as `InternalApi `, it
may be changed in the future, shall we extend a actor system based with it? If
yes, what's the correct value for these parameter?
I saw some similar customized case, such as
[this](https://gist.github.com/aarondav/ca1f0cdcd50727f89c0d#file-exceptioncatchingactorsystemimpl-scala-L14)
and
[this](https://gist.github.com/Kayrnt/9082178#file-rebootactorsystem-scala-L28).
However, it seems their version are both lower.
So hope for your idea and suggestion.
> Add a Thread default uncaught exception handler on the JobManager
> -----------------------------------------------------------------
>
> Key: FLINK-5232
> URL: https://issues.apache.org/jira/browse/FLINK-5232
> Project: Flink
> Issue Type: Sub-task
> Components: JobManager
> Reporter: Stephan Ewen
> Assignee: vinoyang
> Priority: Major
> Labels: pull-request-available
>
> When some JobManager threads die because of uncaught exceptions, we should
> bring down the JobManager. If a thread dies from an uncaught exception, there
> is a high chance that the JobManager becomes dysfunctional.
> The only sfae thing is to rely on the JobManager being restarted by YARN /
> Mesos / Kubernetes / etc.
> I suggest to add this code to the JobManager launch:
> {code}
> Thread.setDefaultUncaughtExceptionHandler(new UncaughtExceptionHandler() {
> @Override
> public void uncaughtException(Thread t, Throwable e) {
> try {
> LOG.error("Thread {} died due to an uncaught exception. Killing
> process.", t.getName());
> } finally {
> Runtime.getRuntime().halt(-1);
> }
> }
> });
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)