[
https://issues.apache.org/jira/browse/TEZ-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257001#comment-15257001
]
Eric Badger commented on TEZ-3213:
----------------------------------
[~hitesh], I don't believe that this same cascading failure will occur in
DAGImpl, TaskImpl, or TaskAttemptImpl. The failure occurred because there was a
missing state transition from the RECOVERING state into the ERROR state due to
V_INTERNAL_ERROR. All other states in VertexImpl are covered when dealing with
a V_INTERNAL_ERROR event.
Before the transition was added, a failure would occur and trigger a
V_INTERNAL_ERROR event, which would start a transition. But, the state machine
didn't know how to handle that event while in the RECOVERING state, because the
transition wasn't defined. This, in turn, caused another V_INTERNAL_ERROR event
to be created, because of the missing transition. This would keep going, which
is what caused the failure message looping.
DAGImpl handles all of the state transitions when an INTERNAL_ERROR event is
presented, so there is no issue there. And from what I can tell, this sort of
internal error event does not exist in TaskImpl or TaskAttemptImpl. So I think
all of our bases are covered.
> Uncaught exception during vertex recovery leads to invalid state transition
> loop
> --------------------------------------------------------------------------------
>
> Key: TEZ-3213
> URL: https://issues.apache.org/jira/browse/TEZ-3213
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jason Lowe
> Assignee: Eric Badger
> Attachments: TEZ-3213-b0.7.001.patch
>
>
> If an uncaught exception occurs during a state transition from the RECOVERING
> vertex then V_INTERNAL_ERROR will be delivered to the state machine, but that
> event is not handled in the RECOVERING state. That in turn causes a
> V_INTERNAL_ERROR event to be delivered to the state machine, and it loops
> logging the invalid transitions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)