[
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001929#comment-15001929
]
Bikas Saha commented on TEZ-2581:
---------------------------------
The API is called taskRecovery but essentially its attemptRecovery in the same
sense as taskCommit is essentially done by the attempt. In any case, the
current approach works too. Looking at that code again, I have a couple of
comments
1) IMO, we should not penalize the task for this because its not the tasks
fault. This remove the need to duplicate numRetry checking logic.
2) Also, in the current code, the setting of successAttempt etc. is spread
across the transition body and the recoveryCheck method which is a little
confusing. Could we change the recoverCheck method to just return if task
commits got recovered or not. And then take action on that in the main
transition.
Changing TEZ-2939 jira title and description would be good to reflect the
intent.
> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
> Key: TEZ-2581
> URL: https://issues.apache.org/jira/browse/TEZ-2581
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-10.patch,
> TEZ-2581-WIP-11.patch, TEZ-2581-WIP-2.patch, TEZ-2581-WIP-3.patch,
> TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch, TEZ-2581-WIP-6.patch,
> TEZ-2581-WIP-7.patch, TEZ-2581-WIP-8.patch, TEZ-2581-WIP-9.patch,
> TezRecoveryRedesignProposal.pdf, TezRecoveryRedesignV1.1.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)