[ 
https://issues.apache.org/jira/browse/AVRO-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991284#comment-12991284
 ] 

Scott Carey edited comment on AVRO-679 at 2/7/11 6:25 AM:
----------------------------------------------------------

I'd expect bigger gains for longs and for encode than for decode.

Avro's int decode runs at about 4 to 5 clock cycles per byte, trying to shave a 
clock cycle off that is hard since every byte requires a conditional operation, 
an array write, a counter increment, and some masking and shifting.
The group varint stuff can cut that down though.  
Longs are still at 8 or so cycles per byte and might have more room to gain 
than Ints.

For your code, the big thing that is likely slowing you down is:
{code}
            buff.put(b(m));
{code}
buff.put is a lot slower than byte array assignment.  All ByteBuffer access is 
slow at this level.
Unfortunately, ByteBuffer is polymorphic and thus 'put' is a virtual method.  A 
couple things to try:  perhaps use ReadWriteHeapByteBuffer, which is final and 
so can be better optimized and is not polymorphic at your call site.  Or, 
unpack the byte buffer higher up and pass around (byte[], position) through 
your methods to get raw byte[] access which is fastest.

You might also try using the Avro Perf.java test and add a new read method to 
BinaryDecoder to test it there.  The way the methods and buffers are set up in 
BinaryDecoder was tweaked heavily to get the JVM to inline and optimize the 
right methods in the right order.  Several equivalent rearrangements of the 
code in BinaryDecoder are slower.  It would be easier to isolate the 
differences if both used the same general setup and buffer.


Group Varint encoding lends itself to specialized processor instructions such 
as SIMD and vectorized instructions.  So it might benefit C/C++ more than Java.

On the encode side, Avro isn't optimized well yet.  See AVRO-753.  We should be 
able to improve int and long encoding by about a factor of 2.5.


      was (Author: scott_carey):
    I'd expect bigger gains for longs and for encode than for decode.

Avro's int decode runs at about 4 to 5 clock cycles per byte, trying to shave a 
clock cycle off that is hard since every byte requires a conditional operation, 
an array write, a counter increment, and some masking and shifting.
The varint stuff can cut that down though.  
Longs are still at 8 or so cycles per byte and might have more room to gain 
than Ints.

For your code, the big thing that is likely slowing you down is:
{code}
            buff.put(b(m));
{code}
buff.put is a lot slower than byte array assignment.  All ByteBuffer access is 
slow at this level.
Unfortunately, ByteBuffer is polymorphic and thus 'put' is a virtual method.  A 
couple things to try:  perhaps use ReadWriteHeapByteBuffer, which is final and 
so can be better optimized and is not polymorphic at your call site.  Or, 
unpack the byte buffer higher up and pass around (byte[], position) through 
your methods to get raw byte[] access which is fastest.

You might also try using the Avro Perf.java test and add a new write method to 
BinaryDecoder to test it there.  The way the methods and buffers are set up in 
BinaryDecoder was tweaked heavily to get the JVM to inline and optimize the 
right methods in the right order.  Several equivalent rearrangements of the 
code in BinaryDecoder are slower.  It would be easier to isolate the 
differences if both used the same general setup and buffer.


Varint encoding lends itself to specialized processor instructions such as SIMD 
and vectorized instructions.  So it might benefit C/C++ more than Java.

On the encode side, Avro isn't optimized well yet.  See AVRO-753.  We should be 
able to improve int and long encoding by about a factor of 2.5.

  
> Improved encodings for arrays
> -----------------------------
>
>                 Key: AVRO-679
>                 URL: https://issues.apache.org/jira/browse/AVRO-679
>             Project: Avro
>          Issue Type: New Feature
>          Components: spec
>            Reporter: Stu Hood
>            Priority: Minor
>
> There are better ways to encode arrays of varints [1] which are faster to 
> decode, and more space efficient than encoding varints independently.
> Extending the idea to other types of variable length data like 'bytes' and 
> 'string', you could encode the entries for an array block as an array of 
> lengths, followed by contiguous byte/utf8 data.
> [1] group varint encoding: slides 57-63 of 
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/WSDM09-keynote.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to