[ 
https://issues.apache.org/jira/browse/ARROW-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186213#comment-17186213
 ] 

Yibo Cai edited comment on ARROW-9873 at 9/1/20, 5:02 AM:
----------------------------------------------------------

Maybe we can use counting method as first step, then scan the counter array and 
insert into a map finally. Guess there won't cause much performance loss as the 
map is small, and we can reserve buckets first. Will do some tests.

Test result with existing benchmark (values within -100~100, array size 1M in 
bytes):
- Small performance drop (< 10%) for Boolean and Int8.
- About 2x performance improvement for Int16/32/64 with limited value range.

Adjusting value range and array size leads to consistent performance uplift.


was (Author: yibo):
Maybe we can use counting method as first step, then scan the counter array and 
insert into a map finally. Guess there won't cause much performance loss as the 
map is small, and we can reserve buckets first. Will do some tests.

> [C++][Compute] Improve mode kernel for intergers within limited value range
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-9873
>                 URL: https://issues.apache.org/jira/browse/ARROW-9873
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Yibo Cai
>            Assignee: Yibo Cai
>            Priority: Major
>         Attachments: mode-range-skylake.png
>
>
> It's possible to improve mode kernel performance for integers within limited 
> value range by using a value indexed array instead of general hash table.
>  Similar trick is used in sorting kernel ARROW-1571.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to