[ https://issues.apache.org/jira/browse/FLINK-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315899#comment-14315899 ]
ASF GitHub Bot commented on FLINK-1489: --------------------------------------- Github user uce commented on the pull request: https://github.com/apache/flink/pull/378#issuecomment-73854842 There is a problem: https://travis-ci.org/apache/flink/jobs/50215407 ``` java.lang.IllegalStateException: Consumer state is FINISHED but was expected to be RUNNING. at org.apache.flink.runtime.deployment.PartialPartitionInfo.createPartitionInfo(PartialPartitionInfo.java:81) at org.apache.flink.runtime.executiongraph.Execution.sendPartitionInfos(Execution.java:581) at org.apache.flink.runtime.executiongraph.Execution.switchToRunning(Execution.java:654) at org.apache.flink.runtime.executiongraph.Execution.access$100(Execution.java:88) at org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:336) at akka.dispatch.OnComplete.internal(Future.scala:247) at akka.dispatch.OnComplete.internal(Future.scala:244) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ``` > Failing JobManager due to blocking calls in > Execution.scheduleOrUpdateConsumers > ------------------------------------------------------------------------------- > > Key: FLINK-1489 > URL: https://issues.apache.org/jira/browse/FLINK-1489 > Project: Flink > Issue Type: Bug > Reporter: Till Rohrmann > Assignee: Till Rohrmann > > [~Zentol] reported that the JobManager failed to execute his python job. The > reason is that the the JobManager executes blocking calls in the actor thread > in the method {{Execution.sendUpdateTaskRpcCall}} as a result to receiving a > {{ScheduleOrUpdateConsumers}} message. > Every TaskManager possibly sends a {{ScheduleOrUpdateConsumers}} to the > JobManager to notify the consumers about available data. The JobManager then > sends to each TaskManager the respective update call > {{Execution.sendUpdateTaskRpcCall}}. By blocking the actor thread, we > effectively execute the update calls sequentially. Due to the ever > accumulating delay, some of the initial timeouts on the TaskManager side in > {{IntermediateResultParititon.scheduleOrUpdateConsumers}} fail. As a result > the execution of the respective Tasks fails. > A solution would be to make the call non-blocking. > A general caveat for actor programming is: We should never block the actor > thread, otherwise we seriously jeopardize the scalability of the system. Or > even worse, the system simply fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)