[
https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann updated FLINK-1925:
---------------------------------
Description:
A user reported that a job times out while submitting tasks to the TaskManager.
The reason is that the JobManager expects a TaskOperationResult response upon
submitting a task to the TM. The TM downloads then the required jars from the
JM which blocks the actor thread and can take a very long time if many TMs
download from the JM. Due to this, the SubmitTask future throws a
TimeOutException.
A possible solution could be that the TM eagerly acknowledges the reception of
the SubmitTask message and executes the task initialization within a future.
The future will upon completion send a UpdateTaskExecutionState message to the
JM which switches the state of the task from deploying to running. This means
that the handler of SubmitTask future in {{Execution}} won't change the state
of the task.
was:
ResearchGate reported that a job times out while submitting tasks to the
TaskManager. The reason is that the JobManager expects a TaskOperationResult
response upon submitting a task to the TM. The TM downloads then the required
jars from the JM which blocks the actor thread and can take a very long time if
many TMs download from the JM. Due to this, the SubmitTask future throws a
TimeOutException.
A possible solution could be that the TM eagerly acknowledges the reception of
the SubmitTask message and executes the task initialization within a future.
The future will upon completion send a UpdateTaskExecutionState message to the
JM which switches the state of the task from deploying to running. This means
that the handler of SubmitTask future in {{Execution}} won't change the state
of the task.
> Split SubmitTask method up into two phases: Receive TDD and instantiation of
> TDD
> --------------------------------------------------------------------------------
>
> Key: FLINK-1925
> URL: https://issues.apache.org/jira/browse/FLINK-1925
> Project: Flink
> Issue Type: Improvement
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
>
> A user reported that a job times out while submitting tasks to the
> TaskManager. The reason is that the JobManager expects a TaskOperationResult
> response upon submitting a task to the TM. The TM downloads then the required
> jars from the JM which blocks the actor thread and can take a very long time
> if many TMs download from the JM. Due to this, the SubmitTask future throws a
> TimeOutException.
> A possible solution could be that the TM eagerly acknowledges the reception
> of the SubmitTask message and executes the task initialization within a
> future. The future will upon completion send a UpdateTaskExecutionState
> message to the JM which switches the state of the task from deploying to
> running. This means that the handler of SubmitTask future in {{Execution}}
> won't change the state of the task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)