[ 
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992880#comment-14992880
 ] 

Jeff Zhang edited comment on TEZ-2581 at 11/6/15 1:24 AM:
----------------------------------------------------------

Right, we need a way to differentiate the 2 cases. 
* case 1: from -1 to numTasks, 
* case 2: from numTask1 to numTasks2
Currently I create a new flag in VertexReconfigureDoneEvent to differentiate 
these 2 cases (vertexReconfigurePlanned is called means the second case, 
correct me if I am wrong)
And in recovery, based on this flag to decide where to restore the vertex status
* If this flag is false, restore the data in init stage 
(Vertex#assignVertexManager)
* If this flag is true, restore the data in running stage 
(VertexManagerPlugin#onVertexStated)

Regarding your method, 2 concerns
>>> always call reconfigurationPlanned() in VM.initialize().
This change the behavior from last AM attempt. Might bring in risk for the next 
recovery (AM crash again)
>>> If numTasks < 0 then it has to fake a trigger by setting up a timer.
Setting up a timer looks a little complicated to me. It bring extra behavior in 
recovery.

[~bikassaha] Any concern about my current way described above ?



was (Author: zjffdu):
Right, we need to way to differentiate the 2 cases. 
* case 1: from -1 to numTasks, 
* case 2: from numTask1 to numTasks2
Currently I create a new flag in VertexReconfigureDoneEvent to differentiate 
these 2 cases (vertexReconfigurePlanned is called means the second case, 
correct me if I am wrong)
And in recovery, based on this flag to decide where to restore the vertex status
* If this flag is false, restore the data in init stage 
(Vertex#assignVertexManager)
* If this flag is true, restore the data in running stage 
(VertexManagerPlugin#onVertexStated)

Regarding your method, 2 concerns
>>> always call reconfigurationPlanned() in VM.initialize().
This change the behavior from last AM attempt. Might bring in risk for the next 
recovery (AM crash again)
>>> If numTasks < 0 then it has to fake a trigger by setting up a timer.
Setting up a timer looks a little complicated to me. It bring extra behavior in 
recovery.

[~bikassaha] Any concern about my current way described above ?


> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
>                 Key: TEZ-2581
>                 URL: https://issues.apache.org/jira/browse/TEZ-2581
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-2.patch, 
> TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch, 
> TEZ-2581-WIP-6.patch, TEZ-2581-WIP-7.patch, TEZ-2581-WIP-8.patch, 
> TEZ-2581-WIP-9.patch, TezRecoveryRedesignProposal.pdf, 
> TezRecoveryRedesignV1.1.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to