[ https://issues.apache.org/jira/browse/AVRO-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890975#comment-16890975 ]
michael elbaz commented on AVRO-1953: ------------------------------------- Hello anyone take care about ? https://issues.apache.org/jira/browse/CAMEL-13737 > ArrayIndexOutOfBoundsException in > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol > ----------------------------------------------------------------------------------------- > > Key: AVRO-1953 > URL: https://issues.apache.org/jira/browse/AVRO-1953 > Project: Apache Avro > Issue Type: Bug > Components: java > Affects Versions: 1.7.4 > Reporter: Yong Zhang > Priority: Major > > We are facing an issue when Avro MapReducer cannot process the avro file in > the reducer. > Here is the schema of our data: > { > "namespace" : "our package name", > "type" : "record", > "name" : "Lists", > "fields" : [ > {"name" : "account_id", "type" : "long"}, > {"name" : "list_id", "type" : "string"}, > {"name" : "sequence_id", "type" : ["int", "null"]} , > {"name" : "name", "type" : ["string", "null"]}, > {"name" : "state", "type" : ["string", "null"]}, > {"name" : "description", "type" : ["string", "null"]}, > {"name" : "dynamic_filtered_list", "type" : ["int", "null"]}, > {"name" : "filter_criteria", "type" : ["string", "null"]}, > {"name" : "created_at", "type" : ["long", "null"]}, > {"name" : "updated_at", "type" : ["long", "null"]}, > {"name" : "deleted_at", "type" : ["long", "null"]}, > {"name" : "favorite", "type" : ["int", "null"]}, > {"name" : "delta", "type" : ["boolean", "null"]}, > { > "name" : "list_memberships", "type" : { > "type" : "array", "items" : { > "name" : "ListMembership", "type" : "record", > "fields" : [ > {"name" : "channel_id", "type" : "string"}, > {"name" : "created_at", "type" : ["long", "null"]}, > {"name" : "created_source", "type" : ["string", > "null"]}, > {"name" : "deleted_at", "type" : ["long", "null"]}, > {"name" : "sequence_id", "type" : ["int", "null"]} > ] > } > } > } > ] > } > Our MapReduce job is to get the delta of the above dataset, and use our merge > logic to merge the latest change into the dataset. > The whole MR job runs daily, and work fine for 18 months. During this time, > we saw 2 times the merge MapReduce job failed with following error (In the > reducer stage, which means the Avro data being read successfully, and send to > the reducers, which we sort the data based on the key and timestamp, so the > delta can be merged in the reducer side): > java.lang.ArrayIndexOutOfBoundsException at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at > org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) > at > org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108) > at > org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142) > at > org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117) > at > org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165) at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at > org.apache.hadoop.mapred.Child$4.run(Child.java:255) at > java.security.AccessController.doPrivileged(AccessController.java:366) at > javax.security.auth.Subject.doAs(Subject.java:572) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > The MapReducer job will fail eventually in the reducer stage. I don't think > our data is corrupted, as they are read fine in the map stage. Every time we > got this error, we have to get the whole huge dataset from the source, then > rebuilt the AVRO, and start building merge again daily, until after several > months, then face this issue due to whatever reason we don't know yet. -- This message was sent by Atlassian JIRA (v7.6.14#76016)