[
https://issues.apache.org/jira/browse/FLINK-12152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822432#comment-16822432
]
Paul Lin edited comment on FLINK-12152 at 4/20/19 1:06 PM:
-----------------------------------------------------------
[~till.rohrmann] The job ran into FLINK-8902 when we triggered a rescale
operation, and after a restart the jobmanager kept reporting timeout when
connecting dispatcher (the stacktrace is as below). And yes, the job couldn't
respond to any user commands including cancel, so we have to kill the YARN
application and resubmit the job. In the meanwhile, the web UI is unavailable
as [~yanghua]'s case. The version is 1.5.3.
```
2019-03-05 10:42:06,017 ERROR
org.apache.flink.runtime.rest.handler.job.JobIdsHandler - Implementation
error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#1638863806|#1638863806]] after [10000 ms].
Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
```
was (Author: paul lin):
[~till.rohrmann] The job ran into
[FLINK-8902|https://issues.apache.org/jira/browse/FLINK-8902] when we triggered
a rescale operation, and after a restart the jobmanager kept reporting timeout
when connecting dispatcher (the stacktrace is as below). The web UI is
unavailable after the restart as [~yanghua]'s case. The Flink version is 1.5.3.
```
2019-03-05 10:42:06,017 ERROR
org.apache.flink.runtime.rest.handler.job.JobIdsHandler - Implementation
error: Unhandled exception.
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/dispatcher#1638863806]] after [10000 ms]. Sender[null]
sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
```
> Make the vcore that Application Master used configurable for Flink on YARN
> --------------------------------------------------------------------------
>
> Key: FLINK-12152
> URL: https://issues.apache.org/jira/browse/FLINK-12152
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Reporter: vinoyang
> Assignee: vinoyang
> Priority: Major
>
> Now, for Flink on YARN deployment mode, each am's vcores is specified to 1
> (hard code).
> In some scene, we found many Akka timeout logs, the Flink web UI cannot be
> opened, but it is alive. I think there is no more threads resource to be used
> for am. So we suggest that make the vcores num of application master can be
> configurable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)