[
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979723#comment-14979723
]
Bikas Saha commented on TEZ-2581:
---------------------------------
bq. should be "vertexData.getVertexFinishedEvent() == null", will fix it.
And its still working :) That means this code is either not relevant, or there
is a bug. We have a missing test case that would exercise this code path.
isDAGRecoverable() & isRecoverable() - just changing names to Summary and
NonSummary should be enough. Also an uber comment explaining the flow like - 1)
read file 2) check summary recover 3) check non-summary recover - would help in
understanding the flow.
bq. ecause it is not known whether this vertex belong to any vertex group when
parsing recovery logs. So here check both vertex level commit and vertex group
level commit.
Could you please explain a little more? From what I understand the we check if
there were any in-progress commit operations. They can be 1) vertex commit
(either after vertex completion or dag completion) 2) group commit (either
after vertex completion or dag completion). Both of these have recovery logs.
If these are found but their corresponding finished logs are not found then we
can error out right? Then why do we need to look at individual members of a
group?
> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
> Key: TEZ-2581
> URL: https://issues.apache.org/jira/browse/TEZ-2581
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-2.patch,
> TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch,
> TEZ-2581-WIP-6.patch, TezRecoveryRedesignProposal.pdf,
> TezRecoveryRedesignV1.1.pdf
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)