[
https://issues.apache.org/jira/browse/TEZ-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434243#comment-15434243
]
Zhiyuan Yang commented on TEZ-3416:
-----------------------------------
VertexConfigurationDoneEvent contains the parallelism of current vertex and
source-edge mapping. So Reducer 3 can find correct parallelism for itself, and
SG edge for Reducer 2. But SG edge get the source vertex parallelism by
invoking getContext().getSourceVertexNumTasks() (not from recovery data), which
is not determined before recovery.
> ArrayIndexOutOfBoundsException happens in ScatterGatherEdgeManager after DAG
> recovery
> -------------------------------------------------------------------------------------
>
> Key: TEZ-3416
> URL: https://issues.apache.org/jira/browse/TEZ-3416
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Zhiyuan Yang
> Assignee: Zhiyuan Yang
> Attachments: applog
>
>
> This happened in Hive on Tez. The query is select count(\*) from f as a, f as
> b. When I manually killed AM, all vertices except the one join the data
> finished reconfiguration. After recovery, DAG failed with
> ArrayIndexOutOfBoundsException in ScatterGatherEdgeManager of the counting
> vertex. It doesn't happen every time.
> {code:java}
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: Fail to
> maybeAddTezEventForDestinationTask,
> event:org.apache.tez.runtime.api.events.CompositeDataMovementEvent@5eca05b9,
> sourceInfo:{ producerConsumerType=OUTPUT, taskVertexName=Reducer 2,
> edgeVertexName=Reducer 3,
> taskAttemptId=attempt_1471650900010_0018_1_02_000003_0 },
> destinationInfo:null, EdgeInfo: sourceVertexName=Reducer 2,
> destinationVertexName=Reducer 3
> at
> org.apache.tez.dag.app.dag.impl.Edge.maybeAddTezEventForDestinationTask(Edge.java:659)
> at
> org.apache.tez.dag.app.dag.impl.VertexImpl.getTaskAttemptTezEvents(VertexImpl.java:3720)
> at
> org.apache.tez.dag.app.TaskCommunicatorManager.heartbeat(TaskCommunicatorManager.java:363)
> at
> org.apache.tez.dag.app.TaskCommunicatorContextImpl.heartbeat(TaskCommunicatorContextImpl.java:98)
> at
> org.apache.tez.dag.app.TezTaskCommunicatorImpl$TezTaskUmbilicalProtocolImpl.heartbeat(TezTaskCommunicatorImpl.java:384)
> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:514)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
> at
> org.apache.tez.dag.app.dag.impl.ScatterGatherEdgeManager.routeCompositeDataMovementEventToDestination(ScatterGatherEdgeManager.java:120)
> at
> org.apache.tez.dag.app.dag.impl.Edge.maybeAddTezEventForDestinationTask(Edge.java:560)
> ... 15 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)