[
https://issues.apache.org/jira/browse/TEZ-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238276#comment-14238276
]
Hitesh Shah commented on TEZ-1744:
----------------------------------
Mostly looks good. Commit pending based on whether there are sufficient tests
to catch the actual scenario during recovery itself?
> It is not necessary to check whether dag is commit in RecoveryTransition
> ------------------------------------------------------------------------
>
> Key: TEZ-1744
> URL: https://issues.apache.org/jira/browse/TEZ-1744
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.1
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-1744.patch
>
>
> It is not necessary to check whether dag is commit in RecoveryTransition,
> because we already check that in RecoveryParser by using the summary event.
> Copy the comments from TEZ-1737,
> bq. But even the non-summary VertexFinishedEvent is seen, its
> VertexRecoverableEventsGeneratedEvent may still lost. I think there's no
> guaranteed that VertexRecoverableEventsGeneratedEvent is logged before
> VertexFinishedEvent.
> The expectation was that all tasks are completed before a vertex has
> finished. Also, a TaskFinishedEvent is only seen after all its datamovement
> events are generated and therefore logged.
> The handling for for the general case where there are a lot of data movement
> events generated, commit started and then ended. In a scenario, where commit
> starts but does not end, the summary log helps catch the problem. Now, in a
> scenario, where commit finished successfully, there could be a situation
> where the AM crashed before all data movements are stored to recovery. In
> this scenario, we cannot do anything as the commit has already been done but
> we have no idea what was lost. The main crux to answer your question is that
> a committer cannot be invoked twice.
> Agree that VertexRecoverableEventsGeneratedEvent is a different problem. In
> such cases, I believe that if VertexRecoverableEventsGeneratedEvent is not
> seen before a VertexFinished is seen, there needs to be some additional
> handling for that scenario too. If a VertexRecoverableEventsGeneratedEvent is
> always guaranteed to be generated for a vertex and it is not seen, then that
> means it is a potential non-recoverable case when the vertex itself was seen
> to have been completed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)