[
https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Scott Carey updated AVRO-753:
-----------------------------
Attachment: AVRO-753.v1.patch
This patch implements an experimental variation of BinaryEncoder named
FastBinaryEncoder and its accompanying EncoderFactory.
This is a first pass proof-of-concept. A final patch would replace
BinaryEncoder rather than introduce FastBinaryEncoder. The purpose here is
that you can do side-by-side comparison with the old one using the new
Perf.java tool in AVRO-752.
The results of Perf with '-noread' mode is as follows.
BinaryEncoder (original):
{noformat}
IntWrite: 2219 ms, 36.047 million entries/sec.
90.729 million bytes/sec
SmallLongWrite: 2253 ms, 35.499 million entries/sec.
89.350 million bytes/sec
LongWrite: 4494 ms, 17.801 million entries/sec.
77.769 million bytes/sec
FloatWrite: 3088 ms, 25.900 million entries/sec.
103.599 million bytes/sec
DoubleWrite: 6000 ms, 13.333 million entries/sec.
106.663 million bytes/sec
BooleanWrite: 876 ms, 91.265 million entries/sec.
91.265 million bytes/sec
BytesWrite: 1007 ms, 15.882 million entries/sec.
565.653 million bytes/sec
StringWrite: 4835 ms, 3.309 million entries/sec.
117.875 million bytes/sec
RecordWrite: 5333 ms, 2.500 million entries/sec.
97.016 million bytes/sec
ValidatingRecordWrite: 5741 ms, 2.322 million entries/sec.
90.121 million bytes/sec
GenericWrite: 3953 ms, 1.686 million entries/sec.
65.439 million bytes/sec
GenericNested_Write: 4429 ms, 1.505 million entries/sec.
58.408 million bytes/sec
{noformat}
FastBinaryEncoder:
{noformat}
IntWrite: 693 ms, 115.425 million entries/sec.
290.518 million bytes/sec
SmallLongWrite: 797 ms, 100.329 million entries/sec.
252.522 million bytes/sec
LongWrite: 1323 ms, 60.450 million entries/sec.
264.097 million bytes/sec
FloatWrite: 561 ms, 142.443 million entries/sec.
569.772 million bytes/sec
DoubleWrite: 893 ms, 89.528 million entries/sec.
716.227 million bytes/sec
BooleanWrite: 317 ms, 252.174 million entries/sec.
252.174 million bytes/sec
BytesWrite: 843 ms, 18.979 million entries/sec.
675.963 million bytes/sec
StringWrite: 4631 ms, 3.455 million entries/sec.
123.065 million bytes/sec
RecordWrite: 1255 ms, 10.617 million entries/sec.
412.047 million bytes/sec
ValidatingRecordWrite: 1686 ms, 7.907 million entries/sec.
306.883 million bytes/sec
GenericWrite: 1302 ms, 5.119 million entries/sec.
198.660 million bytes/sec
GenericNested_Write: 2073 ms, 3.215 million entries/sec.
124.769 million bytes/sec
{noformat}
Performance is 2.5 to 6 times faster.
There is more tuning and testing to do, but I wanted to checkpoint my work at
this point and share progress.
> Java: Improve BinaryEncoder Performance
> ----------------------------------------
>
> Key: AVRO-753
> URL: https://issues.apache.org/jira/browse/AVRO-753
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Scott Carey
> Assignee: Scott Carey
> Fix For: 1.5.0
>
> Attachments: AVRO-753.v1.patch
>
>
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder
> did. It still mostly writes directly to the underlying OutputStream which is
> not optimal for performance. I like to use a rule that if you are writing to
> an OutputStream or reading from an InputStream in chunks smaller than 128
> bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x
> performance improvement. The process is significantly simpler than
> BinaryDecoder because 'pushing' is easier than 'pulling' -- and also because
> we do not need a 'direct' variant because BinaryEncoder already buffers
> sometimes.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira