[
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638232#comment-13638232
]
Scott Carey commented on AVRO-1282:
-----------------------------------
I think changing the binary format for primitives is probably off the table.
Google got performance improvements by going from zig-zag encoding to group
varint encoding, which we tried in the past as an experiment without much luck
but with Unsafe we may do better. Group Varint encoding for the case of arrays
of ints and longs will be fast and significantly smaller than raw int/longs.
Extending the avro spec to allow for group varint encoding may be a better
choice than using fixed size int/long. float/double are already fixed size and
can use unsafe when the native ordering of the system matches (e.g. x86).
Float and Double are already written as 4 or 8 byte chunks, and you can use the
float <> int bit conversion or double <> long bit conversion to pack ints or
longs that way if you wish to test performance differences.
On the other hand, we could pack our variable length writes into int/long on
the stack, then use unsafe rather than per byte writes, which might help. This
would not work when writing to OutputStream but could for writing to byte
buffers or byte[], since the fields are variable length we would have to
'rewind'.
I am not convinced that using Unsafe will help that much on the read side on
the input buffer -- I've already optimized much of the read pipeline to avoid
triggering array bounds checking if you look at the output assembly from the
JIT. It would help for native byte buffers, however (or byte buffers in
general, which have poor performance for the raw read/write access because the
interface methods used to access them do not get inlined). Reading doubles and
floats in Perf.java is very fast -- the bottleneck for
ReflectSmallFloatArrayRead is elsewhere. Have you profiled it?
{noformat}
test name time M entries/sec M bytes/sec
bytes/cycle
FloatRead: 399 ms 501.220 2004.882 1000000
FloatWrite: 1164 ms 171.812 687.248 1000000
DoubleRead: 399 ms 500.677 4005.417 2000000
DoubleWrite: 1896 ms 105.439 843.515 2000000
{noformat}
> Make use of the sun.misc.Unsafe class during serialization if a JDK supports
> it
> -------------------------------------------------------------------------------
>
> Key: AVRO-1282
> URL: https://issues.apache.org/jira/browse/AVRO-1282
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.4
> Reporter: Leo Romanoff
> Priority: Minor
> Attachments: avro-1282-v1.patch, avro-1282-v2.patch,
> avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick
> input/output.
>
> 3) More over, Unsafe makes it possible to serialize to/deserialize from
> off-heap memory directly and very quickly, without any intermediate buffers
> allocated on heap. There is virtually no overhead compared to the usual byte
> arrays.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira