[ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985680#comment-16985680
 ] 

Azim Afroozeh commented on ARROW-3495:
--------------------------------------

Hi all,

I'm opening the following pull request for this issue: 
[https://github.com/apache/arrow/pull/5930]

This patch does the following changes:
 * moves two common function "getNullCount" and 
"splitAndTransferValidityBuffer" to the top-level BaseValueVector. This change 
requires moving "validityBuffer" to the BaseValueVector class (as recommended 
in this TODO: 
[https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L89])
 * optimize the implementation of loadValidityBuffer (in the BaseValueVector) 
to just pass the reference for the validity buffer instead of optimizing it
 * optimize for the common boundary condition when all variables are valid (as 
done in the C++ code: 
[https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L290])

The optimization delivers performance.

Tests: Read 50M integers from a single Int column (2GB).

Before the patch:
Baseline: 7.64 Gb/sec
With the Holder API: 9.99 Gb/sec

After the patch (with the bitmap condition checks)
Baseline: 12.13 Gb/sec (+58.7% gains)
With the Holder API: 16.03 Gb/sec (+60.4% gains)

> [Java] Optimize bit operations performance
> ------------------------------------------
>
>                 Key: ARROW-3495
>                 URL: https://issues.apache.org/jira/browse/ARROW-3495
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 0.11.0
>            Reporter: Li Jin
>            Assignee: Animesh Trivedi
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to