[ https://issues.apache.org/jira/browse/FLINK-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419081#comment-16419081 ]
Till Rohrmann commented on FLINK-8624: -------------------------------------- Hi [~bbayani], does this problem also exists for the Flink 1.5 release branch? > flink-mesos: The flink rest-api sometimes becomes unresponsive > -------------------------------------------------------------- > > Key: FLINK-8624 > URL: https://issues.apache.org/jira/browse/FLINK-8624 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, REST > Affects Versions: 1.3.2 > Reporter: Bhumika Bayani > Priority: Blocker > Fix For: 1.5.0 > > > Sometimes flink-mesos-scheduler fails/get killed, and marathon brings it up > again on some other node. Sometimes we have observed, the rest-api of the > newly created flink instance becomes unresponsive. > Even if we execute api calls manually with curl, such as > http://<host>:<port>/overview or http://<host>:<port>/config > we do not receive any response. > We submit and execute all our flink-jobs using rest-api only. So if rest api > becomes un-responsive, that stops us from running any of the flink jobs and > no stream processing happens. > We tried enabling flink debug logs, but we did not observer anything specific > that indicates why rest api is failing/unresponsive. > We see below exceptions in logs but that is not specific to case when > flink-api is hung. We see them in healthy flink-scheduler too: > > {code:java} > Timestamp=2018-02-08 05:43:49,175 LogLevel=INFO > ThreadId=[Checkpoint Timer] Class=o.a.f.r.c.CheckpointCoordinator > Msg=Triggering checkpoint 10181 @ 1518068629174 > Timestamp=2018-02-08 05:43:49,183 LogLevel=DEBUG > ThreadId=[nioEventLoopGroup-5-3] Class=o.a.f.r.w.WebRuntimeMonitor > Msg=Unhandled exception: {} > akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/jobmanager#753807801]] after [10000 ms] > at > akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:474) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:425) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:429) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at > akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:381) > ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > {code} > > During the time rest api is unresponsive, we have observed flink web UI too > does not load/show any information. > Restarting the flink-scheduler solves this issue sometimes. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)