[
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113324#comment-14113324
]
Bikas Saha commented on TEZ-1345:
---------------------------------
Why are we calling VertexImpl method directly from the vertex manager? This is
a leaky abstraction that may affect future routing code changes. Also, it is
likely that VertexManager activity may be moved away from the dispatcher thread
because it could block normal execution while running user code.
{code}- appContext.getEventHandler().handle(
- new VertexEventRouteEvent(managedVertex.getVertexId(),
Lists.newArrayList(tezEvents)));
+ ((VertexImpl)managedVertex).routeEvents(Lists.newArrayList(tezEvents),
false);{code}
The basic issue here seems to be that the Vertex state machine
RootInputInitializedTransition is changing state to INITED before saving the
input init events. Is it functionally incorrect to do so (effectively being
optimistic). If the init events are not saved and the AM crashes then the AM
would/could simply call the initializer again and start from scratch. Is that
not acceptable?
If it is unacceptable then, if InputInitializers are present, then
canInitVertex() could check if init events have been saved and only then return
true. We could add an inputInitEventsSavedTransition that could be triggered
after the store service has acked the save operation. It could call
canInitVertex() and transition to init if true (just like other cases). This
would slow down vertex init since the the save would block it from starting
tasks. Alternatively that transition could be called inline from the route
event transition when handling the input init events. The latter approach is
again optimistic and we come back to the question of whether being optimistic
is acceptable. If so then probably we should do nothing.
> Add checks to guarantee all init events are written to recovery to consider
> vertex initialized
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: Tez-1345-2.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033
--
This message was sent by Atlassian JIRA
(v6.2#6252)