[ 
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201296#comment-14201296
 ] 

Jeff Zhang commented on TEZ-1734:
---------------------------------

bq. Does this mean that in certain scenarios such as FAILED and KILLED, we 
should ignore recovered events. What about other states such as NEW, etc?
Yes I think so. I think if it is possible for vertex ended in such status ( 
current state & recoveredEvents ) in the first attempt, it is normal for us to 
recovery it to that status.

bq. The invalid state transition is interesting - did that happen only in a 
unit test? 
It happens in the following case.
{code}
v1(inited)     v2(inited)
   \                /
    \              /
      v3(inited)
{code}
When recovery, v1 send V_Start to itself and make v3 move to recovering, then 
it would send V_SOURCE_VERTEX_STARTED, it is OK for v3 to get this event since 
it is still in recovering. 
Then v2 will send V_Start event to itself and send V_SOURCE_VERTEX_RECOVERD to 
v3 which would make v3 go to RUNNING, after that when V_Start event is 
processed (v2 started), it would send V_SOURCE_VERTEX_STARTED to v3, this cause 
the test case failure.

> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> -------------------------------------------------------------------
>
>                 Key: TEZ-1734
>                 URL: https://issues.apache.org/jira/browse/TEZ-1734
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.1
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1734-2.patch, TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in 
> this case, we don't need to recover its tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to