[
https://issues.apache.org/jira/browse/FLINK-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419081#comment-16419081
]
Till Rohrmann commented on FLINK-8624:
--------------------------------------
Hi [~bbayani], does this problem also exists for the Flink 1.5 release branch?
> flink-mesos: The flink rest-api sometimes becomes unresponsive
> --------------------------------------------------------------
>
> Key: FLINK-8624
> URL: https://issues.apache.org/jira/browse/FLINK-8624
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination, REST
> Affects Versions: 1.3.2
> Reporter: Bhumika Bayani
> Priority: Blocker
> Fix For: 1.5.0
>
>
> Sometimes flink-mesos-scheduler fails/get killed, and marathon brings it up
> again on some other node. Sometimes we have observed, the rest-api of the
> newly created flink instance becomes unresponsive.
> Even if we execute api calls manually with curl, such as
> http://<host>:<port>/overview or http://<host>:<port>/config
> we do not receive any response.
> We submit and execute all our flink-jobs using rest-api only. So if rest api
> becomes un-responsive, that stops us from running any of the flink jobs and
> no stream processing happens.
> We tried enabling flink debug logs, but we did not observer anything specific
> that indicates why rest api is failing/unresponsive.
> We see below exceptions in logs but that is not specific to case when
> flink-api is hung. We see them in healthy flink-scheduler too:
>
> {code:java}
> Timestamp=2018-02-08 05:43:49,175 LogLevel=INFO
> ThreadId=[Checkpoint Timer] Class=o.a.f.r.c.CheckpointCoordinator
> Msg=Triggering checkpoint 10181 @ 1518068629174
> Timestamp=2018-02-08 05:43:49,183 LogLevel=DEBUG
> ThreadId=[nioEventLoopGroup-5-3] Class=o.a.f.r.w.WebRuntimeMonitor
> Msg=Unhandled exception: {}
> akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/jobmanager#753807801]] after [10000 ms]
> at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:474)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:425)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:429)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at
> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:381)
> ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}
>
> During the time rest api is unresponsive, we have observed flink web UI too
> does not load/show any information.
> Restarting the flink-scheduler solves this issue sometimes.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)