[ 
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201403#comment-14201403
 ] 

Jeff Zhang commented on TEZ-1734:
---------------------------------

In VertexImpl's RecoveryTransition, vertex would recover to RUNNING from FAILED 
( that's why I add checking recoveryStartEventSeen is true in the patch in this 
case)
{code}
        case FAILED:
        case KILLED:
          vertex.tasksNotYetScheduled = false;
          // recover tasks
          assert vertex.tasks.size() == vertex.numTasks;
          if (vertex.tasks != null  && vertex.numTasks != 0) {
            TaskState taskState = TaskState.KILLED;
            switch (vertex.recoveredState) {
              case SUCCEEDED:
                taskState = TaskState.SUCCEEDED;
                break;
              case KILLED:
                taskState = TaskState.KILLED;
                break;
              case FAILED:
                taskState = TaskState.FAILED;
                break;
            }
            for (Task task : vertex.tasks.values()) {
              vertex.eventHandler.handle(
                  new TaskEventRecoverTask(task.getTaskId(),
                      taskState));
            }
            // Wait for all tasks to recover and report back
            try {
              vertex.recoveryCodeSimulatingStart();
              endState = VertexState.RUNNING;
            } catch (AMUserCodeException e) {
              String msg = "Exception in " + e.getSource() +", vertex:" + 
vertex.getLogIdentifier();
              LOG.error(msg, e);
              vertex.finished(VertexState.FAILED, 
VertexTerminationCause.AM_USERCODE_FAILURE,
                  msg + "," + ExceptionUtils.getStackTrace(e.getCause()));
              endState = VertexState.FAILED;
            }
          } else {
            endState = vertex.recoveredState;
            vertex.finished(endState);
          }
          break;
{code}

> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> -------------------------------------------------------------------
>
>                 Key: TEZ-1734
>                 URL: https://issues.apache.org/jira/browse/TEZ-1734
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.1
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1734-2.patch, TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in 
> this case, we don't need to recover its tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to