[
https://issues.apache.org/jira/browse/AVRO-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. updated AVRO-1953:
--------------------------------------
Component/s: java
> ArrayIndexOutOfBoundsException in
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol
> -----------------------------------------------------------------------------------------
>
> Key: AVRO-1953
> URL: https://issues.apache.org/jira/browse/AVRO-1953
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.4
> Reporter: Yong Zhang
> Priority: Major
>
> We are facing an issue when Avro MapReducer cannot process the avro file in
> the reducer.
> Here is the schema of our data:
> {
> "namespace" : "our package name",
> "type" : "record",
> "name" : "Lists",
> "fields" : [
> {"name" : "account_id", "type" : "long"},
> {"name" : "list_id", "type" : "string"},
> {"name" : "sequence_id", "type" : ["int", "null"]} ,
> {"name" : "name", "type" : ["string", "null"]},
> {"name" : "state", "type" : ["string", "null"]},
> {"name" : "description", "type" : ["string", "null"]},
> {"name" : "dynamic_filtered_list", "type" : ["int", "null"]},
> {"name" : "filter_criteria", "type" : ["string", "null"]},
> {"name" : "created_at", "type" : ["long", "null"]},
> {"name" : "updated_at", "type" : ["long", "null"]},
> {"name" : "deleted_at", "type" : ["long", "null"]},
> {"name" : "favorite", "type" : ["int", "null"]},
> {"name" : "delta", "type" : ["boolean", "null"]},
> {
> "name" : "list_memberships", "type" : {
> "type" : "array", "items" : {
> "name" : "ListMembership", "type" : "record",
> "fields" : [
> {"name" : "channel_id", "type" : "string"},
> {"name" : "created_at", "type" : ["long", "null"]},
> {"name" : "created_source", "type" : ["string",
> "null"]},
> {"name" : "deleted_at", "type" : ["long", "null"]},
> {"name" : "sequence_id", "type" : ["int", "null"]}
> ]
> }
> }
> }
> ]
> }
> Our MapReduce job is to get the delta of the above dataset, and use our merge
> logic to merge the latest change into the dataset.
> The whole MR job runs daily, and work fine for 18 months. During this time,
> we saw 2 times the merge MapReduce job failed with following error (In the
> reducer stage, which means the Avro data being read successfully, and send to
> the reducers, which we sort the data based on the key and timestamp, so the
> delta can be merged in the reducer side):
> java.lang.ArrayIndexOutOfBoundsException at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
> at
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
> at
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
> at
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
> at
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
> at
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165) at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at
> org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
> java.security.AccessController.doPrivileged(AccessController.java:366) at
> javax.security.auth.Subject.doAs(Subject.java:572) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> The MapReducer job will fail eventually in the reducer stage. I don't think
> our data is corrupted, as they are read fine in the map stage. Every time we
> got this error, we have to get the whole huge dataset from the source, then
> rebuilt the AVRO, and start building merge again daily, until after several
> months, then face this issue due to whatever reason we don't know yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)