[
https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015590#comment-13015590
]
Scott Carey commented on AVRO-792:
----------------------------------
-1
This patch drops performance of GenericDatumReader by a factor of 3 to 5. The
portion of time that is in GenericDatumReader and not BinaryDecoder has
increased by up to a factor of 8.
Perf.java results with JVM args: -Xmx256 -Xms256 -server -XX:+UseParallelGC
-XX:+UseCompressedOops
and params: -nowrite -generic -generic-onetime
before:
{code}
Executing tests:
[GenericTest, GenericNested, GenericNestedFake, GenericWithDefault,
GenericWithOutOfOrder, GenericWithPromotion, GenericOneTimeDecoderUse,
GenericOneTimeReaderUse, GenericOneTimeUse]
readTests:true
writeTests:false
cycles=800
test name time M entries/sec M bytes/sec
bytes/cycle
GenericRead: 5054 ms 3.297 127.953 808498
GenericNested_Read: 7701 ms 2.164 83.980 808498
GenericNestedFake_Read: 2869 ms 5.809 225.433 808498
GenericWithDefault_Read: 8993 ms 1.853 71.922 808498
GenericWithOutOfOrder_Read: 5051 ms 3.300 128.050 808498
GenericWithPromotion_Read: 5210 ms 3.199 124.132 808498
GenericOneTimeDecoderUse_Read: 5060 ms 3.293 127.804 808498
GenericOneTimeReaderUse_Read: 6433 ms 2.590 100.531 808498
GenericOneTimeUse_Read: 6306 ms 2.643 102.558 808498
{code}
After:
{code}
Executing tests:
[GenericTest, GenericNested, GenericNestedFake, GenericWithDefault,
GenericWithOutOfOrder, GenericWithPromotion, GenericOneTimeDecoderUse,
GenericOneTimeReaderUse, GenericOneTimeUse]
readTests:true
writeTests:false
cycles=800
test name time M entries/sec M bytes/sec
bytes/cycle
GenericRead: 18918 ms 0.881 34.189 808498
GenericNested_Read: 21704 ms 0.768 29.800 808498
GenericNestedFake_Read: 2899 ms 5.747 223.037 808498
GenericWithDefault_Read: 25627 ms 0.650 25.238 808498
GenericWithOutOfOrder_Read: 21413 ms 0.778 30.206 808498
GenericWithPromotion_Read: 21254 ms 0.784 30.432 808498
GenericOneTimeDecoderUse_Read: 18831 ms 0.885 34.346 808498
GenericOneTimeReaderUse_Read: 18650 ms 0.894 34.680 808498
GenericOneTimeUse_Read: 18698 ms 0.891 34.592 808498
{code}
I got a few dozen stack traces while the latter was executing, and the most
common stack traces are:
{noformat}
"main" prio=5 tid=103800800 nid=0x100501000 runnable [100500000]
java.lang.Thread.State: RUNNABLE
at org.apache.avro.io.Perf$GenericTest.getReader(Perf.java:1087)
at org.apache.avro.io.Perf$GenericTest.readInternal(Perf.java:1110)
at org.apache.avro.io.Perf$BasicTest.readTest(Perf.java:313)
at org.apache.avro.io.Perf.main(Perf.java:210)
"main" prio=5 tid=103800800 nid=0x100501000 runnable [100500000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:2734)
at org.apache.avro.io.parsing.Parser.expandStack(Parser.java:66)
at org.apache.avro.io.parsing.Parser.pushProduction(Parser.java:151)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:100)
at
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:144)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:126)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:102)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:93)
at org.apache.avro.io.Perf$GenericTest.readInternal(Perf.java:1110)
at org.apache.avro.io.Perf$BasicTest.readTest(Perf.java:313)
at org.apache.avro.io.Perf.main(Perf.java:210)
{noformat}
> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
> Key: AVRO-792
> URL: https://issues.apache.org/jira/browse/AVRO-792
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.5.0
> Environment: Mac with VMWare running Linux training-vm-Ubuntu
> Reporter: ey-chih chow
> Assignee: Thiruvalluvan M. G.
> Priority: Blocker
> Fix For: 1.5.1
>
> Attachments: AVRO-792.patch
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken
> with avro 1.5. The M/R job with avro 1.5 worked fine under our debugging
> environment, but broken when we moved to a real cluster. At one instance f
> testing, the job had 23 reducers. Four of them succeeded and the rest failed
> because of the ArrayIndexOutOfBoundsException generated. Here are two
> instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> at
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> at
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
> at
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
> at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
> at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
> at
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
> at
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
> at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
> at
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
> at
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
> at
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
> at
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>>
> collector, Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values,
> AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between
> avro 1.4 and 1.5, but could not figure out why the exceptions were generated.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira