[
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113363#comment-14113363
]
Jeff Zhang commented on TEZ-1345:
---------------------------------
[~bikassaha]
bq. Why are we calling VertexImpl method directly from the vertex manager? This
is a leaky abstraction that may affect future routing code changes. Also, it is
likely that VertexManager activity may be moved away from the dispatcher thread
because it could block normal execution while running user code.
Notice that VertexManager also call method setParallelism and scheduleTasks of
VertexImpl, maybe we should improve this later.
bq. The basic issue here seems to be that the Vertex state machine
RootInputInitializedTransition is changing state to INITED before saving the
input init events. Is it functionally incorrect to do so (effectively being
optimistic). If the init events are not saved and the AM crashes then the AM
would/could simply call the initializer again and start from scratch. Is that
not acceptable?
We don't know how many init event needs to be saved before vertex initialized,
so in recovery it is impossible to tell whether we have recovered part or all
of init events, then we don't know which initializer to call again. Or we could
just ignore all the init events of recovery log and just call all the
initializer again. ( in that case, we don't need to log all the init events,
since we won't use them in recovery)
bq. If it is unacceptable then, if InputInitializers are present, then
canInitVertex() could check if init events have been saved and only then return
true. We could add an inputInitEventsSavedTransition that could be triggered
after the store service has acked the save operation. It could call
canInitVertex() and transition to init if true (just like other cases). This
would slow down vertex init since the the save would block it from starting
tasks. Alternatively that transition could be called inline from the route
event transition when handling the input init events. The latter approach is
again optimistic and we come back to the question of whether being optimistic
is acceptable. If so then probably we should do nothing.
Add another check in canInitVertex() is feasible but as you said it would
affect the performance.
So overall IMO, I prefer to ignore the init events in recovery log and call
initializer again in recovery. It only affect the performance of recovery while
the method of adding check in canInitVertex would affect the performance of
normal run of dag. [~hitesh][~bikassaha] What's your thoughts ?
> Add checks to guarantee all init events are written to recovery to consider
> vertex initialized
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: Tez-1345-2.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033
--
This message was sent by Atlassian JIRA
(v6.2#6252)