[
https://issues.apache.org/jira/browse/TEZ-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201296#comment-14201296
]
Jeff Zhang commented on TEZ-1734:
---------------------------------
bq. Does this mean that in certain scenarios such as FAILED and KILLED, we
should ignore recovered events. What about other states such as NEW, etc?
Yes I think so. I think if it is possible for vertex ended in such status (
current state & recoveredEvents ) in the first attempt, it is normal for us to
recovery it to that status.
bq. The invalid state transition is interesting - did that happen only in a
unit test?
It happens in the following case.
{code}
v1(inited) v2(inited)
\ /
\ /
v3(inited)
{code}
When recovery, v1 send V_Start to itself and make v3 move to recovering, then
it would send V_SOURCE_VERTEX_STARTED, it is OK for v3 to get this event since
it is still in recovering.
Then v2 will send V_Start event to itself and send V_SOURCE_VERTEX_RECOVERD to
v3 which would make v3 go to RUNNING, after that when V_Start event is
processed (v2 started), it would send V_SOURCE_VERTEX_STARTED to v3, this cause
the test case failure.
> Vertex's taskNum may be -1 when recovered from NEW to FAILED/KILLED
> -------------------------------------------------------------------
>
> Key: TEZ-1734
> URL: https://issues.apache.org/jira/browse/TEZ-1734
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.1
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-1734-2.patch, TEZ-1734.patch
>
>
> When vertex recovered from NEW to FAILED/KILLED, the taskNum may be -1, in
> this case, we don't need to recover its tasks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)