[ 
https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510898#comment-14510898
 ] 

ASF GitHub Bot commented on FLINK-1925:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/622

    [FLINK-1925] Fixes blocked method submitTask on the TM

    The ```submitTask``` method which processes the ```SubmitTask``` message 
blocks while downloading the task jars from the ```JobManager```. Depending on 
the number of TMs and jars, this can take a long time. 
    
    In order to get rid of the blocking call the ```submitTask``` method is 
split up into two phases: TDD reception with eager acknowledgement and TDD 
instantiation with a subsequent state update message. The TDD instantiation is 
executed concurrently in a future. Upon finishing the instantiation, an 
```UpdateTaskExecutionState``` message with ```ExecutionState.RUNNING``` is 
sent to the JM. This implies that the state of the ```Execution``` is not 
directly set to ```RUNNING``` by the ```SubmitTask``` future handler which is 
created in ```Execution.deployToSlot```.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixSubmitTask

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #622
    
----
commit cccb6f2dc9ba506b117ba49f92b561cf35c60b2a
Author: Till Rohrmann <[email protected]>
Date:   2015-04-23T14:13:22Z

    [FLINK-1925] [runtime] Splits the processing of the SubmitTask message into 
two phases: TDD reception with eager acknowledgement and TDD instantiation with 
a subsequent state update message.

----


> Split SubmitTask method up into two phases: Receive TDD and instantiation of 
> TDD
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-1925
>                 URL: https://issues.apache.org/jira/browse/FLINK-1925
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> A user reported that a job times out while submitting tasks to the 
> TaskManager. The reason is that the JobManager expects a TaskOperationResult 
> response upon submitting a task to the TM. The TM downloads then the required 
> jars from the JM which blocks the actor thread and can take a very long time 
> if many TMs download from the JM. Due to this, the SubmitTask future throws a 
> TimeOutException.
> A possible solution could be that the TM eagerly acknowledges the reception 
> of the SubmitTask message and executes the task initialization within a 
> future. The future will upon completion send a UpdateTaskExecutionState 
> message to the JM which switches the state of the task from deploying to 
> running. This means that the handler of SubmitTask future in {{Execution}} 
> won't change the state of the task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to