[ 
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015038#comment-15015038
 ] 

Bikas Saha commented on TEZ-2581:
---------------------------------

bq. TA_KILLED event sent would only happens in the case of normal dag execution 
and task attempt able to be recovered . In the case of task attempt unable to 
be recovered, it would go to RUNNING directly.
I agree that the task will move into running state. But the task attempt (that 
failed to be commit recovered) will remain in succeeded state. Right? I was 
thinking that it should move into killed state or else it will show up 
incorrectly as a successful attempt in the UI etc. Is that valid? If yes, then 
we can do this as a follow up.

About the multiple arc transition, I was thinking that the default return state 
would be the exising kill_in_progress state. Only in the new case we can 
override the default return state. In any case, I see what you are saying. Then 
the potential alternatives are
1) Send a new Recovery specific event to transition from fail_in_progress to 
fail instead of a fake container event.
2) From start_wait can we directly go into killed/failed based on the recovery 
data? What advantage are we getting by going into running state and then 
transitioning? Would these trigger any container related code which might cause 
the AM to crash because containers are not recovered? Also for a killed/failed 
task, we dont know if it had gone to running state in the previous AM. There 
are valid transitions from start_wait directly to killed/failed. So that 
attempt could have gone into failed/killed from start_wait in the previous AM 
while in the retry AM we will try to send it through 
start_wait->running->kill/fail-in-progress->killed/failed. Thoughts?


> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
>                 Key: TEZ-2581
>                 URL: https://issues.apache.org/jira/browse/TEZ-2581
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-10.patch, 
> TEZ-2581-WIP-11.patch, TEZ-2581-WIP-12.patch, TEZ-2581-WIP-13.patch, 
> TEZ-2581-WIP-2.patch, TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, 
> TEZ-2581-WIP-5.patch, TEZ-2581-WIP-6.patch, TEZ-2581-WIP-7.patch, 
> TEZ-2581-WIP-8.patch, TEZ-2581-WIP-9.patch, TezRecoveryRedesignProposal.pdf, 
> TezRecoveryRedesignV1.1.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to