[
https://issues.apache.org/jira/browse/AVRO-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609486#comment-13609486
]
Alexandre Normand commented on AVRO-1268:
-----------------------------------------
I've hit a snag. The parallel tree solution mostly works but it doesn't with
recursive schemas. With a recursive schema, we fail with a StackOverflow when
building the tree. To work around that, we would need to have some way of
pointing state elements to previously built state but that would require some
mechanism to lookup these states. It's essentially going back to using a Map.
Or maybe we could be clever and keep such a Map just while building the state
tree.
In any event, I tried running the perf test even with the current broken state
of recursive schemas and the results are similar to the ones I had when doing
the Map lookup (some are a bit slower than my last results but they might just
be run-to-run variations):
{code}
Executing tests:
[IntTest, SmallLongTest, LongTest, FloatTest, DoubleTest, BoolTest, BytesTest,
StringTest, ArrayTest, MapTest, RecordTest, ValidatingRecord, ResolvingRecord,
RecordWithDefault, RecordWithOutOfOrder, RecordWithPromotion, GenericTest,
GenericStrings, GenericNested, GenericNestedFake, GenericWithDefault,
GenericWithOutOfOrder, GenericWithPromotion, GenericOneTimeDecoderUse,
GenericOneTimeReaderUse, GenericOneTimeUse, FooBarSpecificRecordTest]
readTests:true
writeTests:true
cycles=800
test name time M entries/sec M bytes/sec
bytes/cycle
IntRead: 712 ms 280.712 706.637 629325
IntWrite: 1456 ms 137.289 345.598 629325
SmallLongRead: 801 ms 249.549 628.189 629325
SmallLongWrite: 1478 ms 135.299 340.589 629325
LongRead: 1690 ms 118.319 516.984 1092353
LongWrite: 2708 ms 73.842 322.645 1092353
FloatRead: 374 ms 533.773 2135.093 1000000
FloatWrite: 1201 ms 166.403 665.611 1000000
DoubleRead: 353 ms 564.997 4519.978 2000000
DoubleWrite: 1906 ms 104.890 839.122 2000000
BooleanRead: 249 ms 802.739 802.739 250000
BooleanWrite: 524 ms 381.048 381.048 250000
BytesRead: 1613 ms 24.797 881.263 1776937
BytesWrite: 2138 ms 18.708 664.843 1776937
StringRead: 8433 ms 4.743 168.929 1780910
StringWrite: 8097 ms 4.940 175.956 1780910
ArrayRead: 403 ms 495.969 1983.888 1000006
ArrayWrite: 1162 ms 172.068 688.277 1000006
MapRead: 1368 ms 146.168 730.842 1250004
MapWrite: 2127 ms 94.017 470.089 1250004
RecordRead: 650 ms 51.265 1989.599 1617069
RecordWrite: 1987 ms 16.769 650.802 1617069
ValidatingRecordRead: 3790 ms 8.793 341.252 1617069
ValidatingRecordWrite: 3656 ms 9.116 353.806 1617069
ResolvingRecordRead: 4159 ms 8.014 311.034 1617069
RecordWithDefaultRead: 11357 ms 2.935 113.905 1617069
RecordWithOutOfOrderRead: 3295 ms 10.115 392.571 1617069
RecordWithPromotionRead: 3603 ms 9.251 359.046 1617069
GenericRead: 5192 ms 3.210 124.566 808498
GenericWrite: 3285 ms 5.072 196.853 808498
GenericStringsRead: 5913 ms 2.818 300.434 2220873
GenericStringsWrite: 13740 ms 1.213 129.301 2220873
GenericNested_Read: 8459 ms 1.970 76.459 808498
GenericNested_Write: 4873 ms 3.420 132.708 808498
GenericNestedFake_Read: 3397 ms 4.906 190.390 808498
GenericNestedFake_Write: 1596 ms 10.442 405.222 808498
GenericWithDefault_Read: 10029 ms 1.662 64.492 808498
GenericWithOutOfOrder_Read: 5355 ms 3.112 120.782 808498
GenericWithPromotion_Read: 5369 ms 3.104 120.464 808498
GenericOneTimeDecoderUse_Read: 5324 ms 3.130 121.473 808498
GenericOneTimeReaderUse_Read: 8190 ms 2.035 78.965 808498
GenericOneTimeUse_Read: 8274 ms 2.014 78.171 808498
FooBarSpecificRecordTestRead: 37938 ms 0.439 73.409 3481319
FooBarSpecificRecordTestWrite: 31674 ms 0.526 87.927 3481319
{code}
I'm not sure if it's worth continuing down that path. If you get better results
with the parsed-based solution, it's probably the way to go.
> Add java-class, java-key-class and java-element-class support for stringable
> types to SpecificData
> --------------------------------------------------------------------------------------------------
>
> Key: AVRO-1268
> URL: https://issues.apache.org/jira/browse/AVRO-1268
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.4
> Reporter: Alexandre Normand
> Assignee: Alexandre Normand
> Priority: Minor
> Fix For: 1.7.5
>
> Attachments: AVRO-1268-needs-work.patch, AVRO-1268.patch,
> AVRO-1268.patch, AVRO-1268-performance.patch, AVRO-1268.sh,
> GenericStringsPerf.patch, pseudo.patch, pseudo.patch
>
>
> Stringable types are java classes that can be serialized through strings
> (which require a single string constructor and a valid toString()
> implementation). ReflectData currently has support from stringable types but
> it would be desirable to get this feature with SpecificData.
> The work involves changes to the SpecificCompiler (depends on {{@java-class}}
> support in AVRO-1267) to generate the specific sources with the proper java
> type as well as moving the ReflectDatumReader and ReflectDatumWriter to read
> the java-class/java-key-class and java-element-class properties.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira