[ 
https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-753:
-----------------------------

    Attachment: AVRO-753.v2.patch

This patch changes BinaryEncoder for significantly improved performance.  This 
requires that all users of BinaryEncoder use the Encoder API properly and call 
flush() as needed.

This has resulted in 4 BinaryEncoder related classes:
* AbstractBinaryEncoder -- defines the common API and has much shared code, 
mostly low level encoding functions.
* BinaryEncoder -- a fast encoder that buffers, by default up to 2k.
* BlockingBinaryEncoder -- a buffering encoder that implements blocking of 
arrays and maps, extends BinaryEncoder
* DirectBinaryEncoder -- a light-weight encoder that does not buffer but is 
about 2.2 times slower than BinaryEncoder.

I have implemented an EncoderFactory and deprected Encoder.init(OutputStream) 
in favor of having the factory or implementations take care of that.  There are 
some other options for this factory that might better hide abstractions like 
BlockingBinaryEncoder, but the included one here is the simple.

The decisions / discussions around this change that I am uncertain of are:

* API Changes and migration: This change makes BinaryEncoder buffer all the 
time, instead of only sometimes.  All prior uses that did not call flush() were 
bugs, but they are surely out in the wild.  This variation leaves BinaryEncoder 
constructable the old way (the constructor is deprecated, but still there) so 
users might introduce bugs form this change silently.  We could remove the 
constructor entirely, and force a choice through the factory to solve this 
instead.
* Class Heirarchy.  AbstractBinaryEncoder is package protected, and 
DirectBinaryEncoder does not inherit from BinaryEncoder (to keep it light 
weight with minimal member variables and overrides).  Another option is to 
rename BinaryEncodr to BufferedBinaryEncoder, and then change the name of 
AbstractBinaryEncoder to BinaryEncoder and make it public.  This is probably 
the best representation of the classes, but means that BinaryEncoder can no 
longer be constructed.  It could lead to a cleaner Factory as well -- the 
factory could always return the abstract BinaryEncoder type and thus we could 
hide more implementation details behind it and not expose the concrete classes.

I prefer the cleaner factory and class heirarchy to encapsulate the details.  
For exmple, it would allow us to later merge BufferedBinaryEncoder and 
BlockingBinaryEncoder and not affect any user code.  But it means that right 
now, we break an API without deprecating it first -- BinaryEncoder would not 
have public constructors.  A side effect would be that users compile breaks, 
forcing them to choose the fast buffered, or slower direct implementation.


> Java:  Improve BinaryEncoder Performance
> ----------------------------------------
>
>                 Key: AVRO-753
>                 URL: https://issues.apache.org/jira/browse/AVRO-753
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-753.v1.patch, AVRO-753.v2.patch
>
>
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder 
> did.  It still mostly writes directly to the underlying OutputStream which is 
> not optimal for performance.  I like to use a rule that if you are writing to 
> an OutputStream or reading from an InputStream in chunks smaller than 128 
> bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x 
> performance improvement.  The process is significantly simpler than 
> BinaryDecoder because 'pushing' is easier than 'pulling' -- and also because 
> we do not need a 'direct' variant because BinaryEncoder already buffers 
> sometimes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to