[ 
https://issues.apache.org/jira/browse/TEZ-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209041#comment-14209041
 ] 

Jeff Zhang commented on TEZ-1772:
---------------------------------

BTW, why we recover both from the summary logs and non-summary logs. I found 
the following piece of code in VertexImpl.java ( Why make this commit failure 
here ? Can we just make the vertex SUCCEEDED )

{code}
          if (vertex.recoveredState == VertexState.SUCCEEDED
              && vertex.hasCommitter
              && vertex.summaryCompleteSeen && !vertex.vertexCompleteSeen) {
            String msg = "Cannot recover vertex as all recovery events not"
                + " found, vertex=" + vertex.logIdentifier
                + ", hasCommitters=" + vertex.hasCommitter
                + ", summaryCompletionSeen=" + vertex.summaryCompleteSeen
                + ", finalCompletionSeen=" + vertex.vertexCompleteSeen;
            LOG.warn(msg);
            vertex.finished(VertexState.FAILED,
                VertexTerminationCause.COMMIT_FAILURE, msg);
            endState = VertexState.FAILED;
          }
{code}

> Failing tests post TEZ-1737
> ---------------------------
>
>                 Key: TEZ-1772
>                 URL: https://issues.apache.org/jira/browse/TEZ-1772
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>            Priority: Blocker
>         Attachments: TEZ-1772.patch
>
>
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_One2One
> org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast
> org.apache.tez.test.TestDAGRecovery.testBasicRecovery
> {code}
> 2014-11-13 08:30:58,720 ERROR [AsyncDispatcher event handler] 
> impl.VertexImpl: Exception in VertexManager, 
> vertex=vertex_1415838634393_0001_1_01 [v2]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.tez.dag.api.TezUncheckedException: Managed task number must equal 
> 1-1 source task number, oneToOneSrcTaskCount =0,numManagedTasks=2
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:368)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.recoveryCodeSimulatingStart(VertexImpl.java:2417)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.access$9(VertexImpl.java:2416)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:2721)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl$RecoverTransition.transition(VertexImpl.java:1)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1526)
>       at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1741)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Vertex=v2Managed 
> task number must equal 1-1 source task number, oneToOneSrcTaskCount 
> =0,numManagedTasks=2
>       at 
> org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:114)
>       at 
> org.apache.tez.test.TestAMRecovery$ControlledInputReadyVertexManager.onVertexStarted(TestAMRecovery.java:520)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexManager.onVertexStarted(VertexManager.java:365)
>       ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to