[ 
https://issues.apache.org/jira/browse/AVRO-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991207#comment-12991207
 ] 

Scott Carey commented on AVRO-753:
----------------------------------

Pursuing this further has led to new information, some questions, and some 
trouble.

* The old BinaryEncoder in most cases wrote directly to the output stream.  In 
some cases it buffered (writeBytes).  Almost every use of it in Avro assumes 
that it does not buffer.  Therefore, although we know from the mailing lists 
that many users have run into the buffering and now use flush(), many likely do 
not.  Therefore we need something akin to "DirectBinaryEncoder", and another 
big note in CHANGES.txt.  This should be much simpler than the Decoder case.
* BlockingBinaryEncoder should be easy to adapt, and integrate with the 
factory.  It should become simpler than it is now.
* Does itt makes sense to have BinaryEncoder implement BufferedOutputStream?  
And likewise make "DirectBinaryEncoder" implement OutputStream?  This should 
then be easier for users to understand the semantics and not have to keep a 
reference to the underlying stream around to close.  Any use cases where one 
"weaves" avro and non-avro data to the same stream gets much simpler too.


I have made a few more performance improvements, the big one is to 
writeString(String), which goes from ~125MB/sec to ~183MB/sec.  The downside is 
that it requires an additional 50 lines of code and a simpler, 5 line variation 
gets 160MB/sec.  This is the big one for the "thrift/protobuf compare" 
performance benchmark. http://evanjones.ca/software/java-string-encoding.html
We could try adapting the raw UTF-8 code from the Hadoop project and see if 
that is faster.  Perhaps for 1.5.0, we keep it simple and go with the 160MB/sec 
variant and research faster string encoding and decoding on its own later.



> Java:  Improve BinaryEncoder Performance
> ----------------------------------------
>
>                 Key: AVRO-753
>                 URL: https://issues.apache.org/jira/browse/AVRO-753
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.5.0
>
>         Attachments: AVRO-753.v1.patch
>
>
> BinaryEncoder has not had a performance improvement pass like BinaryDecoder 
> did.  It still mostly writes directly to the underlying OutputStream which is 
> not optimal for performance.  I like to use a rule that if you are writing to 
> an OutputStream or reading from an InputStream in chunks smaller than 128 
> bytes, you have a performance problem.
> Measurements indicate that optimizing BinaryEncoder yields a 2.5x to 6x 
> performance improvement.  The process is significantly simpler than 
> BinaryDecoder because 'pushing' is easier than 'pulling' -- and also because 
> we do not need a 'direct' variant because BinaryEncoder already buffers 
> sometimes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to