[ 
https://issues.apache.org/jira/browse/FLINK-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315899#comment-14315899
 ] 

ASF GitHub Bot commented on FLINK-1489:
---------------------------------------

Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/378#issuecomment-73854842
  
    There is a problem: https://travis-ci.org/apache/flink/jobs/50215407
    ```
    java.lang.IllegalStateException: Consumer state is FINISHED but was 
expected to be RUNNING.
        at 
org.apache.flink.runtime.deployment.PartialPartitionInfo.createPartitionInfo(PartialPartitionInfo.java:81)
        at 
org.apache.flink.runtime.executiongraph.Execution.sendPartitionInfos(Execution.java:581)
        at 
org.apache.flink.runtime.executiongraph.Execution.switchToRunning(Execution.java:654)
        at 
org.apache.flink.runtime.executiongraph.Execution.access$100(Execution.java:88)
        at 
org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:336)
        at akka.dispatch.OnComplete.internal(Future.scala:247)
        at akka.dispatch.OnComplete.internal(Future.scala:244)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at 
scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    ```


> Failing JobManager due to blocking calls in 
> Execution.scheduleOrUpdateConsumers
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-1489
>                 URL: https://issues.apache.org/jira/browse/FLINK-1489
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> [~Zentol] reported that the JobManager failed to execute his python job. The 
> reason is that the the JobManager executes blocking calls in the actor thread 
> in the method {{Execution.sendUpdateTaskRpcCall}} as a result to receiving a 
> {{ScheduleOrUpdateConsumers}} message. 
> Every TaskManager possibly sends a {{ScheduleOrUpdateConsumers}} to the 
> JobManager to notify the consumers about available data. The JobManager then 
> sends to each TaskManager the respective update call 
> {{Execution.sendUpdateTaskRpcCall}}. By blocking the actor thread, we 
> effectively execute the update calls sequentially. Due to the ever 
> accumulating delay, some of the initial timeouts on the TaskManager side in 
> {{IntermediateResultParititon.scheduleOrUpdateConsumers}} fail. As a result 
> the execution of the respective Tasks fails.
> A solution would be to make the call non-blocking.
> A general caveat for actor programming is: We should never block the actor 
> thread, otherwise we seriously jeopardize the scalability of the system. Or 
> even worse, the system simply fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to