[ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113324#comment-14113324
 ] 

Bikas Saha commented on TEZ-1345:
---------------------------------

Why are we calling VertexImpl method directly from the vertex manager? This is 
a leaky abstraction that may affect future routing code changes. Also, it is 
likely that VertexManager activity may be moved away from the dispatcher thread 
because it could block normal execution while running user code.
{code}-      appContext.getEventHandler().handle(
-          new VertexEventRouteEvent(managedVertex.getVertexId(), 
Lists.newArrayList(tezEvents)));
+      ((VertexImpl)managedVertex).routeEvents(Lists.newArrayList(tezEvents), 
false);{code}

The basic issue here seems to be that the Vertex state machine 
RootInputInitializedTransition is changing state to INITED before saving the 
input init events. Is it functionally incorrect to do so (effectively being 
optimistic). If the init events are not saved and the AM crashes then the AM 
would/could simply call the initializer again and start from scratch. Is that 
not acceptable?

If it is unacceptable then, if InputInitializers are present, then 
canInitVertex() could check if init events have been saved and only then return 
true. We could add an inputInitEventsSavedTransition that could be triggered 
after the store service has acked the save operation. It could call 
canInitVertex() and transition to init if true (just like other cases). This 
would slow down vertex init since the the save would block it from starting 
tasks. Alternatively that transition could be called inline from the route 
event transition when handling the input init events. The latter approach is 
again optimistic and we come back to the question of whether being optimistic 
is acceptable. If so then probably we should do nothing.

> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1345
>                 URL: https://issues.apache.org/jira/browse/TEZ-1345
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>         Attachments: Tez-1345-2.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to