[ 
https://issues.apache.org/jira/browse/ARROW-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310461#comment-17310461
 ] 

Yibo Cai commented on ARROW-9842:
---------------------------------

Some observation:
- Manually tweaking mm_movemask_epi8 can improve compare kernel performance for 
numerical data by ~60%.
- Defer bitpacking with big chunks may hurt performance.
- The actual effect depends heavily on the generator function and compiler. 
It's not easy to find a general approach to benefit all cases.

> [C++] Explore alternative strategy for Compare kernel implementation for 
> better performance
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-9842
>                 URL: https://issues.apache.org/jira/browse/ARROW-9842
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 5.0.0
>
>         Attachments: movemask-in-chunks.diff, movemask.patch
>
>
> The compiler may be able to vectorize comparison options if the bitpacking of 
> results is deferred until the end (or in chunks). Instead, a temporary 
> bytemap can be populated on a chunk-by-chunk basis and then the bytemaps can 
> be bitpacked into the output buffer. This may also reduce the code size of 
> the compare kernels (which are actually quite large at the moment)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to