[ 
https://issues.apache.org/jira/browse/FLINK-12152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822432#comment-16822432
 ] 

Paul Lin commented on FLINK-12152:
----------------------------------

[~till.rohrmann] The job ran into 
[FLINK-8902|https://issues.apache.org/jira/browse/FLINK-8902] when we triggered 
a rescale operation, and after a restart the jobmanager kept reporting timeout 
when connecting dispatcher (the stacktrace is as below). The web UI is 
unavailable after the restart as [~yanghua]'s case. The Flink version is 1.5.3.

```

2019-03-05 10:42:06,017 ERROR 
org.apache.flink.runtime.rest.handler.job.JobIdsHandler       - Implementation 
error: Unhandled exception.

akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/dispatcher#1638863806]] after [10000 ms]. Sender[null] 
sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".

        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)

        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)

        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)

        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)

        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)

        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)

        at java.lang.Thread.run(Thread.java:748)

``` 

> Make the vcore that Application Master used configurable for Flink on YARN
> --------------------------------------------------------------------------
>
>                 Key: FLINK-12152
>                 URL: https://issues.apache.org/jira/browse/FLINK-12152
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
>
> Now, for Flink on YARN deployment mode, each am's vcores is specified to 1 
> (hard code).
> In some scene, we found many Akka timeout logs, the Flink web UI cannot be 
> opened, but it is alive. I think there is no more threads resource to be used 
> for am. So we suggest that make the vcores num of application master can be 
> configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to