[
https://issues.apache.org/jira/browse/TEZ-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209508#comment-14209508
]
Jeff Zhang commented on TEZ-1772:
---------------------------------
bq. The problem is that there is no way to know if the tasks generated events
which are needed by destination vertices. In this case, given that we have lost
information, there is no clear way to recover especially as a committer has run
( and there is no guarantee whether it can be re-run ).
* But even the non-summary VertexFinishedEvent is seen, its
VertexRecoverableEventsGeneratedEvent may still lost. I think there's no
guaranteed that VertexRecoverableEventsGeneratedEvent is logged before
VertexFinishedEvent.
* And what does it relate with committer , I think committer don't need any
events. If the summary VertexFinishedEvent is seen, that should mean commit is
completed.
> Failing tests post TEZ-1737
> ---------------------------
>
> Key: TEZ-1772
> URL: https://issues.apache.org/jira/browse/TEZ-1772
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Priority: Blocker
> Attachments: TEZ-1772.patch
>
>
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_One2One
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast
> org.apache.tez.test.TestDAGRecovery.testBasicRecovery
> {code}
> 2014-11-13 08:30:58,720 ERROR [AsyncDispatcher event handler]
> impl.VertexImpl: Exception in VertexManager,
> vertex=vertex_1415838634393_0001_1_01 [v2]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException:
> org.apache.tez.dag.api.TezUncheckedException: Managed task number must equal
> 1-1 source task number, oneToOneSrcTaskCount =0,numManagedTasks=2
> at
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:368)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.recoveryCodeSimulatingStart(VertexImpl.java:2417)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$9(VertexImpl.java:2416)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:2721)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:1)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1526)
> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
> at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1741)
> at
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Vertex=v2Managed
> task number must equal 1-1 source task number, oneToOneSrcTaskCount
> =0,numManagedTasks=2
> at
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:114)
> at
> org.apache.tez.test.TestAMRecovery$ControlledInputReadyVertexManager.onVertexStarted(TestAMRecovery.java:520)
> at
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:365)
> ... 16 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)