[ 
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119403#comment-14119403
 ] 

Jeff Zhang commented on TEZ-1345:
---------------------------------

[~hitesh] Attach 2 patches for the following 2 solutions:

* (Tez-1345-5.patch )Add VertexManagerInitializationDone to recovery log. Found 
that it is more complicated than I expected, I have to create a new Event and 
new HistoryEvent for it. And in this case, need to change code of 
StartRecoveryTransition
* (Tez-1345-6.patch ) On second thought, I think that the reason that init 
events may be written to recovery log after VertexInitEvent is that currently 
we just put it in eventQueue of AsyncDispatcher ( by using 
RouteEventTransition). I think the easy way is to handle the init events 
synchronously ( Tez-1345.aptch, Tez-1345-2.patch ). I create the new patch 
Tez-1345-6.patch to cache the root init events in RootInitializerManager and 
write these events before writing VertexInitializedEvent. This can ensure the 
all the root inits events are written before VertexInitializedEvnet, and in 
this way we don't need to change the code in StartRecoveryTransition  

Regarding the performance [~bikassaha] mentioned, I think actually there's no 
performance issue here. Because writing recovery log is in another separate 
thread in RecoverySevice, it won't block AsyncDispatcher in Vertex. 

> Add checks to guarantee all init events are written to recovery to consider 
> vertex initialized
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1345
>                 URL: https://issues.apache.org/jira/browse/TEZ-1345
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>         Attachments: Tez-1345-2.patch, Tez-1345-3.patch, Tez-1345-4.patch, 
> Tez-1345-5.patch, Tez-1345-6.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to