[ 
https://issues.apache.org/jira/browse/TEZ-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338228#comment-14338228
 ] 

Jeff Zhang commented on TEZ-1019:
---------------------------------

[~hitesh] Thanks for review, attach new patch to address the review comments.

bq. in restoreFromEvent, the code goes through manually defined paths instead 
of using existing transition functions resulting in duplication of logic.
It is limited to the current recovering process. Currently, we use the below 
flow to recover 
DAG::restoreFromEvent ->  Vertex::restoreFromEvent -> Task::restoreFromEvent -> 
TaskAttempt::restoreFromEvent -> DAG::RecoveryTranstion -> 
Vertex::RecoveryTransition -> Task::RecoveryTransition -> 
TaskAttempt::RecoveryTransition 
So we have to manually call some function in Vertex::restoreFromEvent to create 
tasks otherwise Task::restoreFromEvent will throw NPE because task has not been 
created.
In theory, I think it is possible to completely align the recovery transition 
and normal transition.  For this, we need to refactor the current recovery 
process. TEZ-1657 is for this.
We can first consolidate all the recovery logs to DagRecoveryData, and use this 
data to recover the dag. And the dag will follow the normal state machine to 
transite, when it needs to recover its vertices, we just need to extract 
VertexRecoveryData from the DagRecoveryData and use it to recovery vertices. 
The same for the task and taskattempt.
DAG::RecoveryTransition -> Vertex::RecoveryTransition -> 
Task::RecoveryTransition -> TaskAttempt :: RecoveryTransition

But this change is too big, so I think we can put it in another jira. 




> Re-factor routing of events to use common code path for normal and recovery 
> flow.
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-1019
>                 URL: https://issues.apache.org/jira/browse/TEZ-1019
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1019-2.patch, TEZ-1019-3.patch, TEZ-1019-4.patch, 
> TEZ-1019-5.patch, Tez-1019.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to