[ 
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010352#comment-15010352
 ] 

Jeff Zhang commented on TEZ-2581:
---------------------------------

bq. Also, should we send a TA_KILLED event here? That would killed the attempt 
(after success) and do the necessary things for that (e.g. notify that its 
events are failed to its consumers). Then TA would go to killed state and 
notify Task. And then task would reschedule new attempt (existing flow). 
However, looking at the TA_KILLED handling, it seems that for leaf tasks, this 
transition ignores the event and does nothing (which would be wrong here).
TA_KILLED event sent would only happens in the case of normal dag execution and 
task attempt able to be recovered . In the case of task attempt unable to be 
recovered, it would go to RUNNING directly.
{code}
if (!recoveredData) {
          LOG.info("Can not recovery the successful task attempt, schedule new 
task attempt,"
              + "taskId=" + task.getTaskId());
          task.successfulAttempt = null;
          task.addAndScheduleAttempt(successTaId);
          return TaskStateInternal.RUNNING;
{code}

bq. Could you please clarify?
I mean if making TerminateTransition as MultipleArcTransition, we need to 
determine the target state in method transition based on the input event type. 
The input event type could be 
TA_FAILED/TA_KILLED/TA_TIMED_OUT/TA_KILL_REQUEST/... and the target state could 
be KILL_IN_PROGRESS/FAIL_IN_PROGRESS/KILLED/FAILED, so many pairs. And 
currently we maintain that when building the state machine. Event if we only 
change TerminatedWhileRunningTransition rather than the base class 
TerminateTransition, the issue still exist, TerminatedWhileRunningTransition is 
used in 6 places in the state machine.

> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
>                 Key: TEZ-2581
>                 URL: https://issues.apache.org/jira/browse/TEZ-2581
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-10.patch, 
> TEZ-2581-WIP-11.patch, TEZ-2581-WIP-12.patch, TEZ-2581-WIP-13.patch, 
> TEZ-2581-WIP-2.patch, TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, 
> TEZ-2581-WIP-5.patch, TEZ-2581-WIP-6.patch, TEZ-2581-WIP-7.patch, 
> TEZ-2581-WIP-8.patch, TEZ-2581-WIP-9.patch, TezRecoveryRedesignProposal.pdf, 
> TezRecoveryRedesignV1.1.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to