[
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021636#comment-15021636
]
Jeff Zhang commented on TEZ-2581:
---------------------------------
bq. Is this still needed now that we move handle TA recovery in the scheduled
transition itself?
It is not possible to move from RUNNING to FAILED, but still possible for move
from RUNNING to RUNNING. change it as following
{code}
.addTransition(TaskStateInternal.RUNNING,
EnumSet.of(TaskStateInternal.SUCCEEDED, TaskStateInternal.RUNNING),
TaskEventType.T_ATTEMPT_SUCCEEDED,
new AttemptSucceededTransition())
{code}
bq. Can the TaskFinishedEvent have a failed state also? Previous AM had a task
failure and then crashed before dag failed finish was written to recovery?
Not sure what the failed state means. There's one field to track the finished
state of the Task. And for the case of task failure and then crashed before
dag failed finish, as long as the TaskFinishedEvent with failed state is logged
(it is not summary event, so may not be logged in time), the dag will be
recovered to failed.
bq. For speculation, the winning TA goes to succeeded. The other TA is killed.
I mean TA_DONE event for speculative TA may be sent before the KILL EVENT.
> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
> Key: TEZ-2581
> URL: https://issues.apache.org/jira/browse/TEZ-2581
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-10.patch,
> TEZ-2581-WIP-11.patch, TEZ-2581-WIP-12.patch, TEZ-2581-WIP-13.patch,
> TEZ-2581-WIP-14.patch, TEZ-2581-WIP-2.patch, TEZ-2581-WIP-3.patch,
> TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch, TEZ-2581-WIP-6.patch,
> TEZ-2581-WIP-7.patch, TEZ-2581-WIP-8.patch, TEZ-2581-WIP-9.patch,
> TezRecoveryRedesignProposal.pdf, TezRecoveryRedesignV1.1.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)