[
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201403#comment-14201403
]
Jeff Zhang commented on TEZ-1734:
---------------------------------
In VertexImpl's RecoveryTransition, vertex would recover to RUNNING from FAILED
( that's why I add checking recoveryStartEventSeen is true in the patch in this
case)
{code}
case FAILED:
case KILLED:
vertex.tasksNotYetScheduled = false;
// recover tasks
assert vertex.tasks.size() == vertex.numTasks;
if (vertex.tasks != null && vertex.numTasks != 0) {
TaskState taskState = TaskState.KILLED;
switch (vertex.recoveredState) {
case SUCCEEDED:
taskState = TaskState.SUCCEEDED;
break;
case KILLED:
taskState = TaskState.KILLED;
break;
case FAILED:
taskState = TaskState.FAILED;
break;
}
for (Task task : vertex.tasks.values()) {
vertex.eventHandler.handle(
new TaskEventRecoverTask(task.getTaskId(),
taskState));
}
// Wait for all tasks to recover and report back
try {
vertex.recoveryCodeSimulatingStart();
endState = VertexState.RUNNING;
} catch (AMUserCodeException e) {
String msg = "Exception in " + e.getSource() +", vertex:" +
vertex.getLogIdentifier();
LOG.error(msg, e);
vertex.finished(VertexState.FAILED,
VertexTerminationCause.AM_USERCODE_FAILURE,
msg + "," + ExceptionUtils.getStackTrace(e.getCause()));
endState = VertexState.FAILED;
}
} else {
endState = vertex.recoveredState;
vertex.finished(endState);
}
break;
{code}
> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> -------------------------------------------------------------------
>
> Key: TEZ-1734
> URL: https://issues.apache.org/jira/browse/TEZ-1734
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.1
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-1734-2.patch, TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in
> this case, we don't need to recover its tasks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)