[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638232#comment-13638232 ]
Scott Carey commented on AVRO-1282: ----------------------------------- I think changing the binary format for primitives is probably off the table. Google got performance improvements by going from zig-zag encoding to group varint encoding, which we tried in the past as an experiment without much luck but with Unsafe we may do better. Group Varint encoding for the case of arrays of ints and longs will be fast and significantly smaller than raw int/longs. Extending the avro spec to allow for group varint encoding may be a better choice than using fixed size int/long. float/double are already fixed size and can use unsafe when the native ordering of the system matches (e.g. x86). Float and Double are already written as 4 or 8 byte chunks, and you can use the float <> int bit conversion or double <> long bit conversion to pack ints or longs that way if you wish to test performance differences. On the other hand, we could pack our variable length writes into int/long on the stack, then use unsafe rather than per byte writes, which might help. This would not work when writing to OutputStream but could for writing to byte buffers or byte[], since the fields are variable length we would have to 'rewind'. I am not convinced that using Unsafe will help that much on the read side on the input buffer -- I've already optimized much of the read pipeline to avoid triggering array bounds checking if you look at the output assembly from the JIT. It would help for native byte buffers, however (or byte buffers in general, which have poor performance for the raw read/write access because the interface methods used to access them do not get inlined). Reading doubles and floats in Perf.java is very fast -- the bottleneck for ReflectSmallFloatArrayRead is elsewhere. Have you profiled it? {noformat} test name time M entries/sec M bytes/sec bytes/cycle FloatRead: 399 ms 501.220 2004.882 1000000 FloatWrite: 1164 ms 171.812 687.248 1000000 DoubleRead: 399 ms 500.677 4005.417 2000000 DoubleWrite: 1896 ms 105.439 843.515 2000000 {noformat} > Make use of the sun.misc.Unsafe class during serialization if a JDK supports > it > ------------------------------------------------------------------------------- > > Key: AVRO-1282 > URL: https://issues.apache.org/jira/browse/AVRO-1282 > Project: Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.7.4 > Reporter: Leo Romanoff > Priority: Minor > Attachments: avro-1282-v1.patch, avro-1282-v2.patch, > avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch > > > Unsafe can be used to significantly speed up serialization process, if a JDK > implementation supports java.misc.Unsafe properly. Most JDKs running on PCs > support it. Some platforms like Android lack a proper support for Unsafe yet. > There are two possibilities to use Unsafe for serialization: > 1) Very quick access to the fields of objects. It is way faster than with the > reflection-based approach using Field.get/set > 2) Input and Output streams can be using Unsafe to perform very quick > input/output. > > 3) More over, Unsafe makes it possible to serialize to/deserialize from > off-heap memory directly and very quickly, without any intermediate buffers > allocated on heap. There is virtually no overhead compared to the usual byte > arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira