[
https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994177#comment-12994177
]
Scott Carey commented on AVRO-753:
----------------------------------
Performance results from the above patch.
I tested with Sun JRE 6u22 (64 bit) on Mac OS X 10.6.6 pm a 2.4 Ghz Intel Core
i5 (2 cores, 4 threads, can 'turbo' up to 2.93Ghz).
I used the following JVM arguments:
-server -Xmx256m -Xms256m -XX:+UseParallelGC -XX:+UseCompressedOops
-XX:+DoEscapeAnalysis -XX:+UseLoopPredicate
ParallelGC is fast and most common on servers. CompressedOops is _highly_
recommended if running 64 bit, it improves performance and reduces memory
footprint.
The last two are default flags in JRE 6u23 and above, but are not in 6u22.
These have measurable impact on the tests. UseLoopPredicate speeds up a couple
cases by 10%.
A 32 bit JVM slows down somewhat. In particular, writeLong is about 35%
slower, and a few other cases degrade by 15% or so. Some others (writeDouble,
writeFloat) don't change. More registers, and 64 bit integer native registers,
help some of the inner loops significantly. I expect non-Intel hardware to
behave more like the 64 bit case.
I ran with the '-noread' command line option of Perf.java
This is the performance of the legacy encoder:
{noformat}
old legacy encoder:
test name time M entries/sec M bytes/sec
bytes/cycle
IntWrite: 3784 ms 52.849 133.036 629325
SmallLongWrite: 3715 ms 53.828 135.500 629325
LongWrite: 6153 ms 32.502 142.013 1092353
FloatWrite: 7289 ms 27.437 109.748 1000000
DoubleWrite: 13988 ms 14.298 114.383 2000000
BooleanWrite: 2150 ms 93.001 93.001 250000
BytesWrite: 2588 ms 15.451 549.113 1776937
StringWrite: 9656 ms 4.142 147.535 1780910
ArrayWrite: 7315 ms 27.340 109.359 1000006
MapWrite: 8727 ms 22.916 114.581 1250004
RecordWrite: 10204 ms 3.266 126.771 1617069
ValidatingRecordWrite: 11584 ms 2.877 111.673 1617069
GenericWrite: 7522 ms 2.216 85.986 808498
GenericNested_Write: 9713 ms 1.716 66.588 808498
GenericNestedFake_Write: 5893 ms 2.828 109.743 808498
{noformat}
And the new BinaryEncoder:
{noformat}
test name time M entries/sec M bytes/sec
bytes/cycle
IntWrite: 1558 ms 128.342 323.076 629325
SmallLongWrite: 1495 ms 133.760 336.714 629325
LongWrite: 2736 ms 73.083 319.329 1092353
FloatWrite: 1286 ms 155.517 622.066 1000000
DoubleWrite: 2005 ms 99.742 797.935 2000000
BooleanWrite: 597 ms 334.696 334.696 250000
BytesWrite: 2491 ms 16.054 570.550 1776937
StringWrite: 9050 ms 4.420 157.417 1780910
ArrayWrite: 1352 ms 147.852 591.412 1000006
MapWrite: 2245 ms 89.054 445.269 1250004
RecordWrite: 2418 ms 13.780 534.813 1617069
ValidatingRecordWrite: 4191 ms 7.952 308.631 1617069
GenericWrite: 3477 ms 4.792 185.978 808498
GenericNested_Write: 5661 ms 2.944 114.249 808498
GenericNestedFake_Write: 2068 ms 8.057 312.696 808498
{noformat}
Performance ranges from 2x to 7x faster, except for writing byte arrays and
strings, which are only slightly faster. The test above writes strings and
byte arrays that average 35 bytes in size -- smaller ones will benefit more
from the buffering, especially with high overhead OutputStreams.
This is the performance of the new non-buffering variation, DirectBinaryEncoder:
{noformat}
test name time M entries/sec M bytes/sec
bytes/cycle
IntWrite: 3446 ms 58.023 146.062 629325
SmallLongWrite: 3491 ms 57.274 144.176 629325
LongWrite: 5931 ms 33.716 147.320 1092353
FloatWrite: 4337 ms 46.105 184.419 1000000
DoubleWrite: 5525 ms 36.194 289.556 2000000
BooleanWrite: 1949 ms 102.603 102.603 250000
BytesWrite: 2814 ms 14.212 505.091 1776937
StringWrite: 9480 ms 4.219 150.285 1780910
ArrayWrite: 4437 ms 45.068 180.273 1000006
MapWrite: 5803 ms 34.464 172.321 1250004
RecordWrite: 5005 ms 6.659 258.446 1617069
ValidatingRecordWrite: 6519 ms 5.113 198.419 1617069
GenericWrite: 4978 ms 3.348 129.920 808498
GenericNested_Write: 6966 ms 2.392 92.838 808498
GenericNestedFake_Write: 3507 ms 4.752 184.430 808498
{noformat}
This is between 0x and 2.5x faster than the 'legacy' BinaryEncoder, with Float
and Double encoding significantly faster and most other things only slightly
faster. It is still substantially slower than the buffering variation.
Next up: BlockingBinaryEncoder. This is essentially the same performance as
the BinaryEncoder, however it defaults to a larger buffer size (64K instead of
2K) and due to this is slightly faster, except for MapWrite, ArrayWrite, where
blocking is in effect.
{noformat}
test name time M entries/sec M bytes/sec
bytes/cycle
IntWrite: 1512 ms 132.260 332.937 629325
SmallLongWrite: 1459 ms 137.012 344.902 629325
LongWrite: 2640 ms 75.739 330.937 1092353
FloatWrite: 1265 ms 158.088 632.352 1000000
DoubleWrite: 1999 ms 100.004 800.032 2000000
BooleanWrite: 638 ms 313.294 313.294 250000
BytesWrite: 2458 ms 16.273 578.305 1776937
StringWrite: 9259 ms 4.320 153.862 1780910
ArrayWrite: 1443 ms 138.580 554.373 1000098
MapWrite: 2589 ms 77.233 386.200 1250119
RecordWrite: 3001 ms 11.104 430.964 1617069
ValidatingRecordWrite: 5829 ms 5.718 221.933 1617069
GenericWrite: 3545 ms 4.701 182.450 808498
GenericNested_Write: 5831 ms 2.858 110.906 808498
GenericNestedFake_Write: 2052 ms 8.119 315.091 808498
{noformat}
And for those curious, this is what JSON looks like:
{noformat}
test name time M entries/sec M bytes/sec
bytes/cycle
IntWrite: 10238 ms 19.534 115.334 1476104
SmallLongWrite: 10383 ms 19.261 113.722 1476104
LongWrite: 18078 ms 11.063 109.950 2484706
FloatWrite: 50300 ms 3.976 42.252 2656635
DoubleWrite: 96585 ms 2.071 39.894 4816469
BooleanWrite: 8940 ms 22.369 123.022 1374900
BytesWrite: 40859 ms 0.979 72.197 3687468
StringWrite: 9021 ms 4.434 166.411 1876635
ArrayWrite: 59728 ms 3.349 54.000 4031647
MapWrite: 63564 ms 3.146 55.460 4406637
RecordWrite: 63687 ms 0.523 64.246 5114596
ValidatingRecordWrite: 65488 ms 0.509 62.480 5114596
GenericWrite: 34985 ms 0.476 58.478 2557400
GenericNested_Write: 42137 ms 0.396 58.047 3057392
GenericNestedFake_Write: 37551 ms 0.444 65.134 3057392
{noformat}
Note that included in all of these results (including the legacy result) is
improved string <> Utf8 conversion in Utf8.java. This brings String encoding
up from ~120MB/sec to ~160MB/sec. I noticed that Jackson was faster than our
binary encoder for the string test case, and now it is a tie. There is more to
do there, but it is dominated by JVM code that isn't as optimal as it should be.
> Java: Improve BinaryEncoder Performance
> ----------------------------------------
>
> Key: AVRO-753
> URL: https://issues.apache.org/jira/browse/AVRO-753
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Scott Carey
> Assignee: Scott Carey
> Fix For: 1.5.0
>
> Attachments: AVRO-753.v1.patch, AVRO-753.v2.patch
>
>
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder
> did. It still mostly writes directly to the underlying OutputStream which is
> not optimal for performance. I like to use a rule that if you are writing to
> an OutputStream or reading from an InputStream in chunks smaller than 128
> bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x
> performance improvement. The process is significantly simpler than
> BinaryDecoder because 'pushing' is easier than 'pulling' -- and also because
> we do not need a 'direct' variant because BinaryEncoder already buffers
> sometimes.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira