[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641789#comment-13641789
 ] 

Leo Romanoff commented on AVRO-1282:
------------------------------------

> On the bug side, FieldAccessors are no longer cached by Class + avro schema 
> index as this is unsafe since two avro schemas can 
> apply to the same class, and have different order for fields or number of 
> fields. The only safe way to cache is to cache by 
> Class + field name, so FieldAccessor[] turned into Map<String, FieldAccessor> 
> which slows things down a little when field 
> access is heavy.

Hmm. But this would introduce overhead on each access. Wouldn't it be better to 
use a different way of caching for FIELDS_ARRAY_CACHE? E.g. we map a compound 
class + schema (as string?) to the FieldAccessor[]? This way we have different 
mappings for different combinations of class/schema, but we only have overhead 
when we do getState and not on each iteration of the loop over struct fields. 
What do you think?

> Additionally, I fixed the bug where we were using native accessors for boxed 
> objects, which would lead to heap corruption.
Good catch!

> Array reading still has a bug. The arrays are assumed to be of size equal to 
> the first array block length, but this is not 
> true. We should get ArrayIndexOutOfBounds if we used the blocked encoding for 
> arrays. These will need to grow in size if there 
> is more than one block, or to chain together and flatten when complete.

Yes. This situation is not handled yet, but should be easy to fix, I'd say. We 
can easily grow arrays in size.


> There is opportunity for better performance with intrinsic arrays if we do 
> away with the iterator for them and use a 
> specialized loop for the raw array case.

But I already introduced this optimization in the latest versions of the patch 
that I submitted. Have you seen them?
                
> Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
> it
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-1282
>                 URL: https://issues.apache.org/jira/browse/AVRO-1282
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.4
>            Reporter: Leo Romanoff
>            Priority: Minor
>         Attachments: AVRO-1282-s1.patch, avro-1282-v1.patch, 
> avro-1282-v2.patch, avro-1282-v3.patch, avro-1282-v4.patch, 
> avro-1282-v5.patch, avro-1282-v6.patch, avro-1282-v7.patch, 
> avro-1282-v8.patch, TestUnsafeUtil.java
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK 
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the 
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick 
> input/output.
>  
> 3) More over, Unsafe makes it possible to serialize to/deserialize from 
> off-heap memory directly and very quickly, without any intermediate buffers 
> allocated on heap. There is virtually no overhead compared to the usual byte 
> arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to