[ 
https://issues.apache.org/jira/browse/TEZ-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434243#comment-15434243
 ] 

Zhiyuan Yang commented on TEZ-3416:
-----------------------------------

VertexConfigurationDoneEvent contains the parallelism of current vertex and 
source-edge mapping. So Reducer 3 can find correct parallelism for itself, and 
SG edge for Reducer 2. But SG edge get the source vertex parallelism by 
invoking getContext().getSourceVertexNumTasks() (not from recovery data), which 
is not determined before recovery.

> ArrayIndexOutOfBoundsException happens in ScatterGatherEdgeManager after DAG 
> recovery
> -------------------------------------------------------------------------------------
>
>                 Key: TEZ-3416
>                 URL: https://issues.apache.org/jira/browse/TEZ-3416
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>         Attachments: applog
>
>
> This happened in Hive on Tez. The query is select count(\*) from f as a, f as 
> b. When I manually killed AM, all vertices except the one join the data 
> finished reconfiguration. After recovery, DAG failed with 
> ArrayIndexOutOfBoundsException in ScatterGatherEdgeManager of the counting 
> vertex. It doesn't happen every time.
> {code:java}
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: Fail to 
> maybeAddTezEventForDestinationTask, 
> event:org.apache.tez.runtime.api.events.CompositeDataMovementEvent@5eca05b9, 
> sourceInfo:{ producerConsumerType=OUTPUT, taskVertexName=Reducer 2, 
> edgeVertexName=Reducer 3, 
> taskAttemptId=attempt_1471650900010_0018_1_02_000003_0 }, 
> destinationInfo:null, EdgeInfo: sourceVertexName=Reducer 2, 
> destinationVertexName=Reducer 3
>       at 
> org.apache.tez.dag.app.dag.impl.Edge.maybeAddTezEventForDestinationTask(Edge.java:659)
>       at 
> org.apache.tez.dag.app.dag.impl.VertexImpl.getTaskAttemptTezEvents(VertexImpl.java:3720)
>       at 
> org.apache.tez.dag.app.TaskCommunicatorManager.heartbeat(TaskCommunicatorManager.java:363)
>       at 
> org.apache.tez.dag.app.TaskCommunicatorContextImpl.heartbeat(TaskCommunicatorContextImpl.java:98)
>       at 
> org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:384)
>       at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
>       at 
> org.apache.tez.dag.app.dag.impl.ScatterGatherEdgeManager.routeCompositeDataMovementEventToDestination(ScatterGatherEdgeManager.java:120)
>       at 
> org.apache.tez.dag.app.dag.impl.Edge.maybeAddTezEventForDestinationTask(Edge.java:560)
>       ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to