[
https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510898#comment-14510898
]
ASF GitHub Bot commented on FLINK-1925:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/622
[FLINK-1925] Fixes blocked method submitTask on the TM
The ```submitTask``` method which processes the ```SubmitTask``` message
blocks while downloading the task jars from the ```JobManager```. Depending on
the number of TMs and jars, this can take a long time.
In order to get rid of the blocking call the ```submitTask``` method is
split up into two phases: TDD reception with eager acknowledgement and TDD
instantiation with a subsequent state update message. The TDD instantiation is
executed concurrently in a future. Upon finishing the instantiation, an
```UpdateTaskExecutionState``` message with ```ExecutionState.RUNNING``` is
sent to the JM. This implies that the state of the ```Execution``` is not
directly set to ```RUNNING``` by the ```SubmitTask``` future handler which is
created in ```Execution.deployToSlot```.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixSubmitTask
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/622.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #622
----
commit cccb6f2dc9ba506b117ba49f92b561cf35c60b2a
Author: Till Rohrmann <[email protected]>
Date: 2015-04-23T14:13:22Z
[FLINK-1925] [runtime] Splits the processing of the SubmitTask message into
two phases: TDD reception with eager acknowledgement and TDD instantiation with
a subsequent state update message.
----
> Split SubmitTask method up into two phases: Receive TDD and instantiation of
> TDD
> --------------------------------------------------------------------------------
>
> Key: FLINK-1925
> URL: https://issues.apache.org/jira/browse/FLINK-1925
> Project: Flink
> Issue Type: Improvement
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
>
> A user reported that a job times out while submitting tasks to the
> TaskManager. The reason is that the JobManager expects a TaskOperationResult
> response upon submitting a task to the TM. The TM downloads then the required
> jars from the JM which blocks the actor thread and can take a very long time
> if many TMs download from the JM. Due to this, the SubmitTask future throws a
> TimeOutException.
> A possible solution could be that the TM eagerly acknowledges the reception
> of the SubmitTask message and executes the task initialization within a
> future. The future will upon completion send a UpdateTaskExecutionState
> message to the JM which switches the state of the task from deploying to
> running. This means that the handler of SubmitTask future in {{Execution}}
> won't change the state of the task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)