[ 
https://issues.apache.org/jira/browse/FLINK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513981#comment-14513981
 ] 

ASF GitHub Bot commented on FLINK-1925:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/622#discussion_r29140659
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
    @@ -795,108 +795,111 @@ extends Actor with ActorLogMessages with 
ActorLogging {
         var startRegisteringTask = 0L
         var task: Task = null
     
    -    // all operations are in a try / catch block to make sure we send a 
result upon any failure
    -    try {
    -      // check that we are already registered
    -      if (!isConnected) {
    -        throw new IllegalStateException("TaskManager is not associated 
with a JobManager")
    -      }
    -      if (slot < 0 || slot >= numberOfSlots) {
    -        throw new Exception(s"Target slot ${slot} does not exist on 
TaskManager.")
    -      }
    +    if (!isConnected) {
    +      sender ! Failure(
    +        new IllegalStateException("TaskManager is not associated with a 
JobManager.")
    +      )
    +    } else if (slot < 0 || slot >= numberOfSlots) {
    +      sender ! Failure(new Exception(s"Target slot $slot does not exist on 
TaskManager."))
    +    } else {
    +      sender ! Acknowledge
     
    -      val userCodeClassLoader = libraryCacheManager match {
    -        case Some(manager) =>
    -          if (LOG.isDebugEnabled) {
    -            startRegisteringTask = System.currentTimeMillis()
    -          }
    +      Future {
    +        try {
    --- End diff --
    
    Can we pull the code in the future into its own method? Makes it easier to 
understand by separating the different parts along the methods.


> Split SubmitTask method up into two phases: Receive TDD and instantiation of 
> TDD
> --------------------------------------------------------------------------------
>
>                 Key: FLINK-1925
>                 URL: https://issues.apache.org/jira/browse/FLINK-1925
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> A user reported that a job times out while submitting tasks to the 
> TaskManager. The reason is that the JobManager expects a TaskOperationResult 
> response upon submitting a task to the TM. The TM downloads then the required 
> jars from the JM which blocks the actor thread and can take a very long time 
> if many TMs download from the JM. Due to this, the SubmitTask future throws a 
> TimeOutException.
> A possible solution could be that the TM eagerly acknowledges the reception 
> of the SubmitTask message and executes the task initialization within a 
> future. The future will upon completion send a UpdateTaskExecutionState 
> message to the JM which switches the state of the task from deploying to 
> running. This means that the handler of SubmitTask future in {{Execution}} 
> won't change the state of the task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to