[ 
https://issues.apache.org/jira/browse/FLINK-12152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822432#comment-16822432
 ] 

Paul Lin edited comment on FLINK-12152 at 4/20/19 1:06 PM:
-----------------------------------------------------------

[~till.rohrmann] The job ran into FLINK-8902 when we triggered a rescale 
operation, and after a restart the jobmanager kept reporting timeout when 
connecting dispatcher (the stacktrace is as below). And yes, the job couldn't 
respond to any user commands including cancel, so we have to kill the YARN 
application and resubmit the job. In the meanwhile, the web UI is unavailable 
as [~yanghua]'s case. The version is 1.5.3.

```

2019-03-05 10:42:06,017 ERROR 
org.apache.flink.runtime.rest.handler.job.JobIdsHandler       - Implementation 
error: Unhandled exception.

akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/dispatcher#1638863806|#1638863806]] after [10000 ms]. 
Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".

        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)

        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)

        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)

        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)

        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)

        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)

        at java.lang.Thread.run(Thread.java:748)

``` 


was (Author: paul lin):
[~till.rohrmann] The job ran into 
[FLINK-8902|https://issues.apache.org/jira/browse/FLINK-8902] when we triggered 
a rescale operation, and after a restart the jobmanager kept reporting timeout 
when connecting dispatcher (the stacktrace is as below). The web UI is 
unavailable after the restart as [~yanghua]'s case. The Flink version is 1.5.3.

```

2019-03-05 10:42:06,017 ERROR 
org.apache.flink.runtime.rest.handler.job.JobIdsHandler       - Implementation 
error: Unhandled exception.

akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/dispatcher#1638863806]] after [10000 ms]. Sender[null] 
sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".

        at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)

        at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)

        at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)

        at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)

        at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)

        at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)

        at 
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)

        at java.lang.Thread.run(Thread.java:748)

``` 

> Make the vcore that Application Master used configurable for Flink on YARN
> --------------------------------------------------------------------------
>
>                 Key: FLINK-12152
>                 URL: https://issues.apache.org/jira/browse/FLINK-12152
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
>
> Now, for Flink on YARN deployment mode, each am's vcores is specified to 1 
> (hard code).
> In some scene, we found many Akka timeout logs, the Flink web UI cannot be 
> opened, but it is alive. I think there is no more threads resource to be used 
> for am. So we suggest that make the vcores num of application master can be 
> configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to