[
https://issues.apache.org/jira/browse/TEZ-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209381#comment-14209381
]
Hitesh Shah commented on TEZ-1772:
----------------------------------
bq. Then in the case that summary event of VertexFinished is seen but
non-summary VertexFinshed event not seen. Should we recover all its tasks to
desired state rather than recovering it to failed as in the above code ?
The problem is that there is no way to know if the tasks generated events which
are needed by destination vertices. In this case, given that we have lost
information, there is no clear way to recover especially as a committer has run
( and there is no guarantee whether it can be re-run ).
> Failing tests post TEZ-1737
> ---------------------------
>
> Key: TEZ-1772
> URL: https://issues.apache.org/jira/browse/TEZ-1772
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Priority: Blocker
> Attachments: TEZ-1772.patch
>
>
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_One2One
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast
> org.apache.tez.test.TestDAGRecovery.testBasicRecovery
> {code}
> 2014-11-13 08:30:58,720 ERROR [AsyncDispatcher event handler]
> impl.VertexImpl: Exception in VertexManager,
> vertex=vertex_1415838634393_0001_1_01 [v2]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException:
> org.apache.tez.dag.api.TezUncheckedException: Managed task number must equal
> 1-1 source task number, oneToOneSrcTaskCount =0,numManagedTasks=2
> at
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:368)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.recoveryCodeSimulatingStart(VertexImpl.java:2417)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$9(VertexImpl.java:2416)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:2721)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:1)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1526)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
> at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1741)
> at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Vertex=v2Managed
> task number must equal 1-1 source task number, oneToOneSrcTaskCount
> =0,numManagedTasks=2
> at
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:114)
> at
> org.apache.tez.test.TestAMRecovery$ControlledInputReadyVertexManager.onVertexStarted(TestAMRecovery.java:520)
> at
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:365)
> ... 16 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)