[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638232#comment-13638232
 ] 

Scott Carey commented on AVRO-1282:
-----------------------------------

I think changing the binary format for primitives is probably off the table.  
Google got performance improvements by going from zig-zag encoding to group 
varint encoding, which we tried in the past as an experiment without much luck 
but with Unsafe we may do better.  Group Varint encoding for the case of arrays 
of ints and longs will be fast and significantly smaller than raw int/longs.   
Extending the avro spec to allow for group varint encoding may be a better 
choice than using fixed size int/long.  float/double are already fixed size and 
can use unsafe when the native ordering of the system matches (e.g. x86).

Float and Double are already written as 4 or 8 byte chunks, and you can use the 
float <> int bit conversion or double <> long bit conversion to pack ints or 
longs that way if you wish to test performance differences.

On the other hand, we could pack our variable length writes into int/long on 
the stack, then use unsafe rather than per byte writes, which might help.  This 
would not work when writing to OutputStream but could for writing to byte 
buffers or byte[], since the fields are variable length we would have to 
'rewind'.

I am not convinced that using Unsafe will help that much on the read side on 
the input buffer -- I've already optimized much of the read pipeline to avoid 
triggering array bounds checking if you look at the output assembly from the 
JIT.  It would help for native byte buffers, however (or byte buffers in 
general, which have poor performance for the raw read/write access because the 
interface methods used to access them do not get inlined).  Reading doubles and 
floats in Perf.java is very fast -- the bottleneck for 
ReflectSmallFloatArrayRead is elsewhere.  Have you profiled it?
{noformat}
                    test name     time    M entries/sec   M bytes/sec  
bytes/cycle
                    FloatRead:    399 ms     501.220      2004.882       1000000
                   FloatWrite:   1164 ms     171.812       687.248       1000000
                   DoubleRead:    399 ms     500.677      4005.417       2000000
                  DoubleWrite:   1896 ms     105.439       843.515       2000000
{noformat}


                
> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-1282
>                 URL: https://issues.apache.org/jira/browse/AVRO-1282
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.4
>            Reporter: Leo Romanoff
>            Priority: Minor
>         Attachments: avro-1282-v1.patch, avro-1282-v2.patch, 
> avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to