[jira] [Updated] (AVRO-1953) ArrayIndexOutOfBoundsException in org.apache.avro.io.parsing.Symbol$Alternative.getSymbol

Thiruvalluvan M. G. (JIRA) Sat, 29 Dec 2018 20:10:14 -0800


     [ 
https://issues.apache.org/jira/browse/AVRO-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thiruvalluvan M. G. updated AVRO-1953:
--------------------------------------
    Component/s: java

> ArrayIndexOutOfBoundsException in 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol
> -----------------------------------------------------------------------------------------
>
>                 Key: AVRO-1953
>                 URL: https://issues.apache.org/jira/browse/AVRO-1953
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.4
>            Reporter: Yong Zhang
>            Priority: Major
>
> We are facing an issue when Avro MapReducer cannot process the avro file in 
> the reducer. 
> Here is the schema of our data:
> {
>     "namespace" : "our package name",
>     "type" : "record",
>     "name" : "Lists",
>     "fields" : [
>         {"name" : "account_id", "type" : "long"},
>         {"name" : "list_id", "type" : "string"},
>         {"name" : "sequence_id", "type" : ["int", "null"]} ,
>         {"name" : "name", "type" : ["string", "null"]},
>         {"name" : "state", "type" : ["string", "null"]},
>         {"name" : "description", "type" : ["string", "null"]},
>         {"name" : "dynamic_filtered_list", "type" : ["int", "null"]},
>         {"name" : "filter_criteria", "type" : ["string", "null"]},
>         {"name" : "created_at", "type" : ["long", "null"]},
>         {"name" : "updated_at", "type" : ["long", "null"]},
>         {"name" : "deleted_at", "type" : ["long", "null"]},
>         {"name" : "favorite", "type" : ["int", "null"]},
>         {"name" : "delta", "type" : ["boolean", "null"]},
>         {
>             "name" : "list_memberships", "type" : {
>                 "type" : "array", "items" : {
>                     "name" : "ListMembership", "type" : "record",
>                     "fields" : [
>                         {"name" : "channel_id", "type" : "string"},
>                         {"name" : "created_at", "type" : ["long", "null"]},
>                         {"name" : "created_source", "type" : ["string", 
> "null"]},
>                         {"name" : "deleted_at", "type" : ["long", "null"]},
>                         {"name" : "sequence_id", "type" : ["int", "null"]}
>                     ]
>                 }
>             }
>         }
>     ]
> }
> Our MapReduce job is to get the delta of the above dataset, and use our merge 
> logic to merge the latest change into the dataset.
> The whole MR job runs daily, and work fine for 18 months. During this time, 
> we saw 2 times the merge MapReduce job failed with following error (In the 
> reducer stage, which means the Avro data being read successfully, and send to 
> the reducers, which we sort the data based on the key and timestamp, so the 
> delta can be merged in the reducer side):
> java.lang.ArrayIndexOutOfBoundsException at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at 
> org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) 
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) 
> at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
>  at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
>  at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
>  at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
>  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
> java.security.AccessController.doPrivileged(AccessController.java:366) at 
> javax.security.auth.Subject.doAs(Subject.java:572) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>  at org.apache.hadoop.mapred.Child.main(Child.java:249)
> The MapReducer job will fail eventually in the reducer stage. I don't think 
> our data is corrupted, as they are read fine in the map stage. Every time we 
> got this error, we have to get the whole huge dataset from the source, then 
> rebuilt the AVRO, and start building merge again daily, until after several 
> months, then face this issue due to whatever reason we don't know yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (AVRO-1953) ArrayIndexOutOfBoundsException in org.apache.avro.io.parsing.Symbol$Alternative.getSymbol

Reply via email to