[
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010352#comment-15010352
]
Jeff Zhang commented on TEZ-2581:
---------------------------------
bq. Also, should we send a TA_KILLED event here? That would killed the attempt
(after success) and do the necessary things for that (e.g. notify that its
events are failed to its consumers). Then TA would go to killed state and
notify Task. And then task would reschedule new attempt (existing flow).
However, looking at the TA_KILLED handling, it seems that for leaf tasks, this
transition ignores the event and does nothing (which would be wrong here).
TA_KILLED event sent would only happens in the case of normal dag execution and
task attempt able to be recovered . In the case of task attempt unable to be
recovered, it would go to RUNNING directly.
{code}
if (!recoveredData) {
LOG.info("Can not recovery the successful task attempt, schedule new
task attempt,"
+ "taskId=" + task.getTaskId());
task.successfulAttempt = null;
task.addAndScheduleAttempt(successTaId);
return TaskStateInternal.RUNNING;
{code}
bq. Could you please clarify?
I mean if making TerminateTransition as MultipleArcTransition, we need to
determine the target state in method transition based on the input event type.
The input event type could be
TA_FAILED/TA_KILLED/TA_TIMED_OUT/TA_KILL_REQUEST/... and the target state could
be KILL_IN_PROGRESS/FAIL_IN_PROGRESS/KILLED/FAILED, so many pairs. And
currently we maintain that when building the state machine. Event if we only
change TerminatedWhileRunningTransition rather than the base class
TerminateTransition, the issue still exist, TerminatedWhileRunningTransition is
used in 6 places in the state machine.
> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
> Key: TEZ-2581
> URL: https://issues.apache.org/jira/browse/TEZ-2581
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-10.patch,
> TEZ-2581-WIP-11.patch, TEZ-2581-WIP-12.patch, TEZ-2581-WIP-13.patch,
> TEZ-2581-WIP-2.patch, TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch,
> TEZ-2581-WIP-5.patch, TEZ-2581-WIP-6.patch, TEZ-2581-WIP-7.patch,
> TEZ-2581-WIP-8.patch, TEZ-2581-WIP-9.patch, TezRecoveryRedesignProposal.pdf,
> TezRecoveryRedesignV1.1.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)