[ 
https://issues.apache.org/jira/browse/ARROW-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102412#comment-17102412
 ] 

Yibo Cai commented on ARROW-8553:
---------------------------------

Did a quick test, performance improvement is promising with VisitWords(>10x).

There's one issue need to address. Would like to hear your comments [~apitrou], 
[~bkietz]
VisitWords calls visitor on each word, but bits in first word is unknown to 
visitor, it may be less than a full word size. See 
[code|https://github.com/apache/arrow/blob/6002ec388840de5622e39af85abdc57a2cccc9b2/cpp/src/arrow/util/bit_util.h#L960].
It makes it hard to use VisitWords to handle bitmap operations (and, or, ...), 
as I don't how many valid bits to write to output buffer for first word, and 
bit offset of later words cannot be determined. VisitWords returns bit length 
of first word, but it's too late, all visitors are already finished.

I recommend adding a parameter "valid bits" to visitor function, which tells 
how many bits are valid in current word. Only first and last word may be not 
full size.
What's your opinion? Or are there better ways? Thanks.

> [C++] Reimplement BitmapAnd using Bitmap::VisitWords
> ----------------------------------------------------
>
>                 Key: ARROW-8553
>                 URL: https://issues.apache.org/jira/browse/ARROW-8553
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.17.0
>            Reporter: Antoine Pitrou
>            Assignee: Yibo Cai
>            Priority: Major
>
> Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using 
> {{Bitmap::VisitWords}} instead would probably yield a manyfold performance 
> increase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to