[ 
https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-753:
-----------------------------

    Attachment: AVRO-753.v1.patch

This patch implements an experimental variation of BinaryEncoder named 
FastBinaryEncoder and its accompanying EncoderFactory.

This is a first pass proof-of-concept.  A final patch would replace 
BinaryEncoder rather than introduce FastBinaryEncoder.  The purpose here is 
that you can do side-by-side comparison with the old one using the new 
Perf.java tool in AVRO-752.

The results of Perf with '-noread' mode is as follows.

BinaryEncoder (original):
{noformat}
                     IntWrite:  2219 ms,     36.047 million entries/sec.     
90.729 million bytes/sec
               SmallLongWrite:  2253 ms,     35.499 million entries/sec.     
89.350 million bytes/sec
                    LongWrite:  4494 ms,     17.801 million entries/sec.     
77.769 million bytes/sec
                   FloatWrite:  3088 ms,     25.900 million entries/sec.    
103.599 million bytes/sec
                  DoubleWrite:  6000 ms,     13.333 million entries/sec.    
106.663 million bytes/sec
                 BooleanWrite:   876 ms,     91.265 million entries/sec.     
91.265 million bytes/sec
                   BytesWrite:  1007 ms,     15.882 million entries/sec.    
565.653 million bytes/sec
                  StringWrite:  4835 ms,      3.309 million entries/sec.    
117.875 million bytes/sec
                  RecordWrite:  5333 ms,      2.500 million entries/sec.     
97.016 million bytes/sec
        ValidatingRecordWrite:  5741 ms,      2.322 million entries/sec.     
90.121 million bytes/sec
                 GenericWrite:  3953 ms,      1.686 million entries/sec.     
65.439 million bytes/sec
          GenericNested_Write:  4429 ms,      1.505 million entries/sec.     
58.408 million bytes/sec
{noformat}

FastBinaryEncoder:
{noformat}
                     IntWrite:   693 ms,    115.425 million entries/sec.    
290.518 million bytes/sec
               SmallLongWrite:   797 ms,    100.329 million entries/sec.    
252.522 million bytes/sec
                    LongWrite:  1323 ms,     60.450 million entries/sec.    
264.097 million bytes/sec
                   FloatWrite:   561 ms,    142.443 million entries/sec.    
569.772 million bytes/sec
                  DoubleWrite:   893 ms,     89.528 million entries/sec.    
716.227 million bytes/sec
                 BooleanWrite:   317 ms,    252.174 million entries/sec.    
252.174 million bytes/sec
                   BytesWrite:   843 ms,     18.979 million entries/sec.    
675.963 million bytes/sec
                  StringWrite:  4631 ms,      3.455 million entries/sec.    
123.065 million bytes/sec
                  RecordWrite:  1255 ms,     10.617 million entries/sec.    
412.047 million bytes/sec
        ValidatingRecordWrite:  1686 ms,      7.907 million entries/sec.    
306.883 million bytes/sec
                 GenericWrite:  1302 ms,      5.119 million entries/sec.    
198.660 million bytes/sec
          GenericNested_Write:  2073 ms,      3.215 million entries/sec.    
124.769 million bytes/sec
{noformat}

Performance is 2.5 to 6 times faster.

There is more tuning and testing to do, but I wanted to checkpoint my work at 
this point and share progress.

> Java:  Improve BinaryEncoder Performance
> ----------------------------------------
>
>                 Key: AVRO-753
>                 URL: https://issues.apache.org/jira/browse/AVRO-753
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-753.v1.patch
>
>
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder 
> did.  It still mostly writes directly to the underlying OutputStream which is 
> not optimal for performance.  I like to use a rule that if you are writing to 
> an OutputStream or reading from an InputStream in chunks smaller than 128 
> bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x 
> performance improvement.  The process is significantly simpler than 
> BinaryDecoder because 'pushing' is easier than 'pulling' -- and also because 
> we do not need a 'direct' variant because BinaryEncoder already buffers 
> sometimes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to