[
https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021148#comment-13021148
]
ey-chih chow commented on AVRO-792:
-----------------------------------
I re-ran the test by cutting the number of input files to one and setting the
number of reducers to one. The job was still broken, as shown below. Does it
make sense to try the first patch, rather than the patch 3? What kind of test
I can run to help you debug?
======================================================================================
cloudera@cloudera-demo:~/src/ngpipes-etl/dist$ hadoop jar ngpipesjobs.jar
com.ngmoco.ngpipes.etl.NgEventETLJob -D mapred.reduce.tasks=1
input/etl/test_avro_bugfix/2011-04-12/0200 etl_out avro/ngpipes-events.avdl
Input Path => input/etl/test_avro_bugfix/2011-04-12/0200
Log Start Time => 2011:04:12:02
Setting Job Name => NgEventETLJob 2011:04:12:02 2011:04:12:03
Output Path => etl_out
Fetching From URL => http://partner.plusplus.com/admin/products.json
isProduction => false
11/04/15 14:17:41 INFO etl.NgEventETLJob: Setting plus.json.games.table
11/04/15 14:17:41 INFO mapred.FileInputFormat: Total input paths to process : 1
11/04/15 14:17:41 INFO mapred.JobClient: Running job: job_201104132218_0004
11/04/15 14:17:42 INFO mapred.JobClient: map 0% reduce 0%
11/04/15 14:17:51 INFO mapred.JobClient: map 100% reduce 0%
11/04/15 14:17:59 INFO mapred.JobClient: map 100% reduce 33%
11/04/15 14:18:01 INFO mapred.JobClient: Task Id :
attempt_201104132218_0004_r_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 3
at
org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:246)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at
org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:223)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:123)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:147)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:119)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:110)
at
org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
at
org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
at
org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
at
org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
at
com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:39)
at
com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
at
org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
at
org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:234)
============================================================================
> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
> Key: AVRO-792
> URL: https://issues.apache.org/jira/browse/AVRO-792
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.5.0
> Environment: Mac with VMWare running Linux training-vm-Ubuntu
> Reporter: ey-chih chow
> Assignee: Thiruvalluvan M. G.
> Priority: Blocker
> Fix For: 1.5.1
>
> Attachments: AVRO-792-2.patch, AVRO-792-3.patch, AVRO-792.patch
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken
> with avro 1.5. The M/R job with avro 1.5 worked fine under our debugging
> environment, but broken when we moved to a real cluster. At one instance f
> testing, the job had 23 reducers. Four of them succeeded and the rest failed
> because of the ArrayIndexOutOfBoundsException generated. Here are two
> instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> at
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> at
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
> at
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> at
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> at
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
> at
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>>
> collector, Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values,
> AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between
> avro 1.4 and 1.5, but could not figure out why the exceptions were generated.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira