[
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985680#comment-16985680
]
Azim Afroozeh commented on ARROW-3495:
--------------------------------------
Hi all,
I'm opening the following pull request for this issue:
[https://github.com/apache/arrow/pull/5930]
This patch does the following changes:
* moves two common function "getNullCount" and
"splitAndTransferValidityBuffer" to the top-level BaseValueVector. This change
requires moving "validityBuffer" to the BaseValueVector class (as recommended
in this TODO:
[https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L89])
* optimize the implementation of loadValidityBuffer (in the BaseValueVector)
to just pass the reference for the validity buffer instead of optimizing it
* optimize for the common boundary condition when all variables are valid (as
done in the C++ code:
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L290])
The optimization delivers performance.
Tests: Read 50M integers from a single Int column (2GB).
Before the patch:
Baseline: 7.64 Gb/sec
With the Holder API: 9.99 Gb/sec
After the patch (with the bitmap condition checks)
Baseline: 12.13 Gb/sec (+58.7% gains)
With the Holder API: 16.03 Gb/sec (+60.4% gains)
> [Java] Optimize bit operations performance
> ------------------------------------------
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Java
> Affects Versions: 0.11.0
> Reporter: Li Jin
> Assignee: Animesh Trivedi
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )
--
This message was sent by Atlassian Jira
(v8.3.4#803005)