[ 
https://issues.apache.org/jira/browse/TEZ-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239706#comment-15239706
 ] 

Jason Lowe commented on TEZ-3213:
---------------------------------

Sample log showing the initial error and the subsequent loop
{noformat}
2016-04-12 08:46:23,002 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Uncaught Exception when handling event 
V_SOURCE_VERTEX_RECOVERED on vertex scope-4784 with vertexId 
vertex_1459233834927_3098531_1_14 at current state RECOVERING
java.lang.RuntimeException: Invalid Vertex state, found non-zero recovered 
events in invalid state, recoveredState=KILLED, recoveredEvents=3840
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:3298)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:3004)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
[...]
2016-04-12 08:46:23,062 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Can't handle Invalid event V_INTERNAL_ERROR on vertex 
scope-4784 with vertexId vertex_1459233834927_3098531_1_14 at current state 
RECOVERING
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
V_INTERNAL_ERROR at RECOVERING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
[...]
2016-04-12 08:46:23,086 [ERROR] [Dispatcher thread {Central}] 
|impl.VertexImpl|: Can't handle Invalid event V_INTERNAL_ERROR on vertex 
scope-4784 with vertexId vertex_1459233834927_3098531_1_14 at current state 
RECOVERING
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
V_INTERNAL_ERROR at RECOVERING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
[...]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
V_INTERNAL_ERROR at RECOVERING
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1875)
        at 
org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:202)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2115)
        at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2101)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
{noformat}


> Uncaught exception during vertex recovery leads to invalid state transition 
> loop
> --------------------------------------------------------------------------------
>
>                 Key: TEZ-3213
>                 URL: https://issues.apache.org/jira/browse/TEZ-3213
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>
> If an uncaught exception occurs during a state transition from the RECOVERING 
> vertex then V_INTERNAL_ERROR will be delivered to the state machine, but that 
> event is not handled in the RECOVERING state.  That in turn causes a 
> V_INTERNAL_ERROR event to be delivered to the state machine, and it loops 
> logging the invalid transitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to