Robert Metzger resolved FLINK-1489.
       Resolution: Fixed
    Fix Version/s: 0.9

Merged in http://git-wip-us.apache.org/repos/asf/flink/commit/aedbacfc. Thank 

> Failing JobManager due to blocking calls in 
> Execution.scheduleOrUpdateConsumers
> -------------------------------------------------------------------------------
>                 Key: FLINK-1489
>                 URL: https://issues.apache.org/jira/browse/FLINK-1489
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>             Fix For: 0.9
> [~Zentol] reported that the JobManager failed to execute his python job. The 
> reason is that the the JobManager executes blocking calls in the actor thread 
> in the method {{Execution.sendUpdateTaskRpcCall}} as a result to receiving a 
> {{ScheduleOrUpdateConsumers}} message. 
> Every TaskManager possibly sends a {{ScheduleOrUpdateConsumers}} to the 
> JobManager to notify the consumers about available data. The JobManager then 
> sends to each TaskManager the respective update call 
> {{Execution.sendUpdateTaskRpcCall}}. By blocking the actor thread, we 
> effectively execute the update calls sequentially. Due to the ever 
> accumulating delay, some of the initial timeouts on the TaskManager side in 
> {{IntermediateResultParititon.scheduleOrUpdateConsumers}} fail. As a result 
> the execution of the respective Tasks fails.
> A solution would be to make the call non-blocking.
> A general caveat for actor programming is: We should never block the actor 
> thread, otherwise we seriously jeopardize the scalability of the system. Or 
> even worse, the system simply fails.

This message was sent by Atlassian JIRA

Reply via email to