[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639339#comment-13639339
 ] 

Scott Carey commented on AVRO-1282:
-----------------------------------

Yes, ResolvingDecoder is a bottleneck.  I have several ideas for abolishing it 
completely but they aren't trivial.  In general, decoding involves continuously 
traversing multiple data structures -- the Parser and the object graph being 
built or read, as well as the Schema.  Instead, building the read pipeline and 
resolving by composition of small functional bits and precomputing all of the 
possibilities into a composite function will be faster.  This composite 
function can then pass through ASM to 'devirtualize' it and inline some 
operations.  That is out of scope for this ticket, but enhancing what we can in 
the current framework is not a bad idea.

Re:  Float/Double
How much do those improvements help the Generic tests? I assume they are still 
dominated by the issues with the decoder.  Although decreasing the time of a 
Float read by 35% is a big win, it is a small part of what happens with 
Generic/Specific/Reflect reading.  Does using unsafe here break anything if the 
system's byte order does not match?

Your results for reading Floats/Doubles above are slower than mine, is this 
because you are now populating an array when reading as well?  The benchmark 
was intended to isolate (as best as possible) the read and write portions of 
the decoder/encoder?  I'd be interested in looking at this code.


As for arrays of primitives, we should look into Group varint encoding and 
similar techniques: 
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/WSDM09-keynote.pdf
  (go to page 55).


In general, lets separate out the Unsafe enhancements for Reflection from those 
that are for the Input/Output streams.  Lets get the reflection work in this 
ticket done and committed, then move on to other enhancements.


We have three tickets then:
* This one related to reflection improvements via Unsafe.
* Another one related  to input/output improvements via Unsafe.
* A third related to other non-Unsafe performance improvements, for example 
those you allude to with:
{quote}
Coming back to your message about reading doubles and floats and observation 
that they are much slower with ReflectDatumReader/Writer. I just did some 
profiling and found some issues. With small changes, I could significantly 
improve write performance and improved a bit read performance. All this without 
Unsafe reads/writes into streams yet.
{quote}

I think we should try and separate these from each other.

                
> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-1282
>                 URL: https://issues.apache.org/jira/browse/AVRO-1282
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.4
>            Reporter: Leo Romanoff
>            Priority: Minor
>         Attachments: avro-1282-v1.patch, avro-1282-v2.patch, 
> avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to