Siddharth Seth created TEZ-2367:
-----------------------------------
Summary: Corruption of TezHeartbeatRequest
Key: TEZ-2367
URL: https://issues.apache.org/jira/browse/TEZ-2367
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Siddharth Seth
Assignee: Bikas Saha
Priority: Blocker
The following exception is seen in the AM logs while attempting to deserialize
a heartbeat request.
{code}
java.lang.ArrayIndexOutOfBoundsException: 1382376565
at
org.apache.tez.runtime.api.impl.EventMetaData.readFields(EventMetaData.java:120)
at
org.apache.tez.runtime.api.impl.TezEvent.readFields(TezEvent.java:271)
at
org.apache.tez.runtime.api.impl.TezHeartbeatRequest.readFields(TezHeartbeatRequest.java:110)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:160)
at
org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1869)
at
org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1801)
at
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1559)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:784)
at
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:650)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:621)
{code}
TEZ-2234 is what changed the serialization most recently. [~bikassaha] - mind
taking a look.
>From a quick glance, it looks like this is caused by the way TaskStatistics
>are serialized. ioStatistics.size followed by an iterator over ioStatistics.
ioStatistics can change during this time as different Inputs / Outputs get
initialized. Synchronizing should fix this.
Also, setting the statistics may require synchronization to ensure correct
values are written.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)