[
https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038099#comment-13038099
]
ey-chih chow commented on AVRO-792:
-----------------------------------
I ran it again with 1.5.1 and the problem went away as indicated below.
Probably it is a problem of one of the jars.
cloudera@cloudera-demo:~/src/ngpipes-etl/dist$ hadoop jar ngpipesjobs.jar
com.ngmoco.ngpipes.etl.NgEventETLJob -D ngpipes.environment=staging
input/etl/test_avro_bugfix/2011-04-12/0200 etl_out avro/ngpipes-events.avdl
Input Path => input/etl/test_avro_bugfix/2011-04-12/0200
Log Start Time => 2011:04:12:02
Setting Job Name => NgEventETLJob 2011:04:12:02 2011:04:12:03
Output Path => etl_out
11/05/23 11:02:44 INFO mapred.FileInputFormat: Total input paths to process : 1
11/05/23 11:02:44 INFO mapred.JobClient: Running job: job_201105231018_0003
11/05/23 11:02:45 INFO mapred.JobClient: map 0% reduce 0%
11/05/23 11:02:56 INFO mapred.JobClient: map 1% reduce 0%
11/05/23 11:02:57 INFO mapred.JobClient: map 50% reduce 0%
11/05/23 11:02:58 INFO mapred.JobClient: map 100% reduce 0%
11/05/23 11:03:08 INFO mapred.JobClient: map 100% reduce 100%
11/05/23 11:03:09 INFO mapred.JobClient: Job complete: job_201105231018_0003
11/05/23 11:03:09 INFO mapred.JobClient: Counters: 27
11/05/23 11:03:09 INFO mapred.JobClient:
com.ngmoco.ngpipes.utils.NgPipesGlobals$EventClassCounter
11/05/23 11:03:09 INFO mapred.JobClient: PLUS_EVENT=109
11/05/23 11:03:09 INFO mapred.JobClient: Job Counters
11/05/23 11:03:09 INFO mapred.JobClient: Launched reduce tasks=1
11/05/23 11:03:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21272
11/05/23 11:03:09 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
11/05/23 11:03:09 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
11/05/23 11:03:09 INFO mapred.JobClient: Launched map tasks=2
11/05/23 11:03:09 INFO mapred.JobClient: Data-local map tasks=2
11/05/23 11:03:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10509
11/05/23 11:03:09 INFO mapred.JobClient:
com.ngmoco.ngpipes.etl.NgEventETLMapper$EventSourceTypes
11/05/23 11:03:09 INFO mapred.JobClient: PLUS_SERVER=109
11/05/23 11:03:09 INFO mapred.JobClient: FileSystemCounters
11/05/23 11:03:09 INFO mapred.JobClient: FILE_BYTES_READ=18844
11/05/23 11:03:09 INFO mapred.JobClient: HDFS_BYTES_READ=29284
11/05/23 11:03:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=233467
11/05/23 11:03:09 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18120
11/05/23 11:03:09 INFO mapred.JobClient:
com.ngmoco.ngpipes.etl.NgEventETLMapper$Event
11/05/23 11:03:09 INFO mapred.JobClient: ERR_NO_AREL=18
11/05/23 11:03:09 INFO mapred.JobClient: ERR_NULL_VALUE=109
11/05/23 11:03:09 INFO mapred.JobClient: Map-Reduce Framework
11/05/23 11:03:09 INFO mapred.JobClient: Reduce input groups=5
11/05/23 11:03:09 INFO mapred.JobClient: Combine output records=0
11/05/23 11:03:09 INFO mapred.JobClient: Map input records=109
11/05/23 11:03:09 INFO mapred.JobClient: Reduce shuffle bytes=18850
11/05/23 11:03:09 INFO mapred.JobClient: Reduce output records=109
11/05/23 11:03:09 INFO mapred.JobClient: Spilled Records=218
11/05/23 11:03:09 INFO mapred.JobClient: Map output bytes=18538
11/05/23 11:03:09 INFO mapred.JobClient: Map input bytes=25081
11/05/23 11:03:09 INFO mapred.JobClient: Combine input records=0
11/05/23 11:03:09 INFO mapred.JobClient: Map output records=109
11/05/23 11:03:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=358
11/05/23 11:03:09 INFO mapred.JobClient: Reduce input records=109
> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
> Key: AVRO-792
> URL: https://issues.apache.org/jira/browse/AVRO-792
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.5.0, 1.5.1
> Environment: Mac with VMWare running Linux training-vm-Ubuntu
> Reporter: ey-chih chow
> Priority: Blocker
> Fix For: 1.5.2
>
> Attachments: AVRO-792-2.patch, AVRO-792-3.patch, AVRO-792.patch,
> part-00000.avro, part-00000.avro, part-00001.avro, part-00001.avro
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken
> with avro 1.5. The M/R job with avro 1.5 worked fine under our debugging
> environment, but broken when we moved to a real cluster. At one instance f
> testing, the job had 23 reducers. Four of them succeeded and the rest failed
> because of the ArrayIndexOutOfBoundsException generated. Here are two
> instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> at
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> at
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
> at
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> at
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> at
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
> at
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>>
> collector, Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values,
> AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between
> avro 1.4 and 1.5, but could not figure out why the exceptions were generated.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira