[ https://issues.apache.org/jira/browse/TEZ-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hitesh Shah updated TEZ-2107: ----------------------------- Affects Version/s: 0.5.0 > Recovery failure in the case of Auto-reduce parallelism > ------------------------------------------------------- > > Key: TEZ-2107 > URL: https://issues.apache.org/jira/browse/TEZ-2107 > Project: Apache Tez > Issue Type: Sub-task > Affects Versions: 0.6.0 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > > The following errors happens when recovering in the case of auto-reduce > parallelism. The task number is reduced from 2 to 1. while the upstream > vertex's DataMovementEvent is still routed to task 2 which has been removed > when auto-reduce parallelism. > {code} > 2015-02-16 09:11:54,587 FATAL [Dispatcher thread: Central] > common.AsyncDispatcher: Error in dispatcher thread > org.apache.tez.dag.api.TezUncheckedException: Unexpected null task. > sourceVertex=vertex_1424048826974_0002_1_00 [scope-47] srcTaskIndex = 0 > destVertex=vertex_1424048826974_0002_1_01 [scope-50] destTaskIndex=1 > destNumTasks=1 > edgeManager=org.apache.tez.dag.app.dag.impl.ScatterGatherEdgeManager > at > org.apache.tez.dag.app.dag.impl.Edge.sendDmEventOrIfEventToTasks(Edge.java:358) > at > org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:422) > at > org.apache.tez.dag.app.dag.impl.Edge.handleCompositeDataMovementEvent(Edge.java:310) > at > org.apache.tez.dag.app.dag.impl.Edge.sendTezEventToDestinationTasks(Edge.java:378) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.handleRoutedTezEvents(VertexImpl.java:3795) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.access$3600(VertexImpl.java:187) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3708) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$RouteEventTransition.transition(VertexImpl.java:3700) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1575) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:186) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1802) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1788) > at > org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115) > at java.lang.Thread.run(Thread.java:745) > {code} > The following exception will also happen sometimes > {code} > 2015-06-10 08:02:03,417 ERROR [Dispatcher thread: Central] impl.VertexImpl: > Exception in VertexManager, vertex:vertex_1433894507873_0001_1_01 [Summation] > org.apache.tez.dag.app.dag.impl.AMUserCodeException: > org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source > should exist, vertexName=Summation > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerCallback.onFailure(VertexManager.java:516) > at com.google.common.util.concurrent.Futures$6.run(Futures.java:977) > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) > at > com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134) > at > com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:86) > at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:380) > at java.util.concurrent.FutureTask.setException(FutureTask.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:267) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite > source should exist, vertexName=Summation > at > org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:459) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:585) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:656) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:651) > at > org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:1) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ... 3 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)