[
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640081#comment-13640081
]
Scott Carey commented on AVRO-1282:
-----------------------------------
Regarding copying of data into an array on read:
Many cases will be reading not into an array, but into an object. Object field
writes and constructor parameters are faster than array writes because there is
no bounds checking and so less branches.
Yes, it is more realistic to test it where the data in a read test ends up
someplace, and in a write test where it comes from some place, but that is what
the Generic/Specific/Reflect tests are for -- these are meant to isolate the
BinaryEncoder or BinaryDecoder as much as possible. Perhaps we can get the
encoder tests to avoid the array read.
We may want both variations of performance tests in the long run so that we can
see the isolated parts as well as the effect when mixed with other likely
activity.
Regarding Group Varint and raw arrays for int/long:
We want to keep our compactness of representation for the same reasons Google
does. But in any case, changing the format is a big deal -- it would likely
require changes in every language implementation and a rev of Avro to 2.0 (we
only change binary representation in major versions). Such a big change would
take time and likely include many other spec breaking features.
As an aside, for array copying System.arraycopy can be faster than Unsafe for
cases where the copy is between two arrays and the call site is not megamorphic
and can be in-lined. In that case, the JIT can turn the array copy into one
instruction on x86 (REP MOVSB), no matter how long the array is.
Interestingly, in some cases it is fastest to write:
{code}
for (int i = 0; i < out.lengh; i++) {
out[i] = arr[i];
}
{code}
than even System.arraycopy, as the JIT converts the above into a single
instruction if the bounds check can be eliminated or pushed outside the loop,
while with System.arraycopy it uses a stub that first checks whether the arrays
are not the same and the sizes are larger than a threshold.
More interestingly, to get the above to compile to that speed the loop can't be
in a shared method that has many call sites, so copying the code multiple times
can help.
> Make use of the sun.misc.Unsafe class during serialization if a JDK supports
> it
> -------------------------------------------------------------------------------
>
> Key: AVRO-1282
> URL: https://issues.apache.org/jira/browse/AVRO-1282
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.7.4
> Reporter: Leo Romanoff
> Priority: Minor
> Attachments: avro-1282-v1.patch, avro-1282-v2.patch,
> avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch
>
>
> Unsafe can be used to significantly speed up serialization process, if a JDK
> implementation supports java.misc.Unsafe properly. Most JDKs running on PCs
> support it. Some platforms like Android lack a proper support for Unsafe yet.
> There are two possibilities to use Unsafe for serialization:
> 1) Very quick access to the fields of objects. It is way faster than with the
> reflection-based approach using Field.get/set
> 2) Input and Output streams can be using Unsafe to perform very quick
> input/output.
>
> 3) More over, Unsafe makes it possible to serialize to/deserialize from
> off-heap memory directly and very quickly, without any intermediate buffers
> allocated on heap. There is virtually no overhead compared to the usual byte
> arrays.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira