[ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113363#comment-14113363
 ] 

Jeff Zhang commented on TEZ-1345:
---------------------------------

[~bikassaha]
bq. Why are we calling VertexImpl method directly from the vertex manager? This 
is a leaky abstraction that may affect future routing code changes. Also, it is 
likely that VertexManager activity may be moved away from the dispatcher thread 
because it could block normal execution while running user code.
Notice that VertexManager also call method setParallelism and scheduleTasks of 
VertexImpl, maybe we should improve this later.

bq. The basic issue here seems to be that the Vertex state machine 
RootInputInitializedTransition is changing state to INITED before saving the 
input init events. Is it functionally incorrect to do so (effectively being 
optimistic). If the init events are not saved and the AM crashes then the AM 
would/could simply call the initializer again and start from scratch. Is that 
not acceptable?

We don't know how many init event needs to be saved before vertex initialized, 
so in recovery it is impossible to tell whether we have recovered part or all 
of init events, then we don't know which initializer to call again. Or we could 
just ignore all the init events of recovery log and just call all the 
initializer again. ( in that case, we don't need to log all the init events, 
since we won't use them in recovery)

bq. If it is unacceptable then, if InputInitializers are present, then 
canInitVertex() could check if init events have been saved and only then return 
true. We could add an inputInitEventsSavedTransition that could be triggered 
after the store service has acked the save operation. It could call 
canInitVertex() and transition to init if true (just like other cases). This 
would slow down vertex init since the the save would block it from starting 
tasks. Alternatively that transition could be called inline from the route 
event transition when handling the input init events. The latter approach is 
again optimistic and we come back to the question of whether being optimistic 
is acceptable. If so then probably we should do nothing.

Add another check in canInitVertex() is feasible but as you said it would 
affect the performance.

So overall IMO, I prefer to ignore the init events in recovery log and call 
initializer again in recovery. It only affect the performance of recovery while 
the method of adding check in canInitVertex would affect the performance of 
normal run of dag. [~hitesh][~bikassaha] What's your thoughts ?



> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1345
>                 URL: https://issues.apache.org/jira/browse/TEZ-1345
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>         Attachments: Tez-1345-2.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to