[ 
https://issues.apache.org/jira/browse/TEZ-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209292#comment-14209292
 ] 

Jeff Zhang commented on TEZ-1772:
---------------------------------

bq. Is there a reason why numTasks is not written to both summary and 
non-summary?
No special reason, just want to make less change. 

bq. A vertexFinished event in recovery log is only expected to be seen after 
all tasks/task attempts completions are seen. A summary event is seen out of 
order and therefore when trying to recover a completed vertex, using just a 
summary event will only give partial info. The summary event is mainly to catch 
commit in progress errors.
Then in the case that summary event of VertexFinished is seen but non-summary 
VertexFinshed event not seen. Should we recover all its tasks to desired state 
rather than recovering it to failed as in the above code ?


> Failing tests post TEZ-1737
> ---------------------------
>
>                 Key: TEZ-1772
>                 URL: https://issues.apache.org/jira/browse/TEZ-1772
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>            Priority: Blocker
>         Attachments: TEZ-1772.patch
>
>
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_One2One
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast
> org.apache.tez.test.TestDAGRecovery.testBasicRecovery
> {code}
> 2014-11-13 08:30:58,720 ERROR [AsyncDispatcher event handler] 
> impl.VertexImpl: Exception in VertexManager, 
> vertex=vertex_1415838634393_0001_1_01 [v2]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.tez.dag.api.TezUncheckedException: Managed task number must equal 
> 1-1 source task number, oneToOneSrcTaskCount =0,numManagedTasks=2
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:368)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.recoveryCodeSimulatingStart(VertexImpl.java:2417)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$9(VertexImpl.java:2416)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:2721)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:1)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1526)
>       at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1741)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Vertex=v2Managed 
> task number must equal 1-1 source task number, oneToOneSrcTaskCount 
> =0,numManagedTasks=2
>       at 
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:114)
>       at 
> org.apache.tez.test.TestAMRecovery$ControlledInputReadyVertexManager.onVertexStarted(TestAMRecovery.java:520)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:365)
>       ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to