[
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341526#comment-17341526
]
Kirill Lykov edited comment on ARROW-10899 at 5/9/21, 2:56 PM:
---------------------------------------------------------------
Do we have some information about distribution of integers? And also are they
signed/unsigned?
I've did some more benchmarking and plots
[https://github.com/KirillLykov/int-sort-bmk.|https://github.com/KirillLykov/int-sort-bmk]
In short, radix sort is almost always performs better than std::stable_sort.
But there are some cases when stable_sort is better – sorted sequence, all
elements are equal.
Another observation is that it looks like LSD implementation is faster than
MSD implementation.
Yet it is possible to have almost the same performance using MSD/LSD hybrid
algorithm if we think that MSD is better for memory bandwidth.
The question is how to measure memory bandwidth consumption, I employed
LLC-cache-misses events.
was (Author: klykov):
Do we have some information about distribution of integers? And also are they
signed/unsigned?
I've did some more benchmarking and plots [int-sort-bmks
repo|[https://github.com/KirillLykov/int-sort-bmk].]
In short, radix sort is almost always performs better than std::stable_sort.
But there are some cases when stable_sort is better – sorted sequence, all
elements are equal.
Another observation is that it looks like LSD implementation is faster than
MSD implementation.
Yet it is possible to have almost the same performance using MSD/LSD hybrid
algorithm if we think that MSD is better for memory bandwidth.
The question is how to measure memory bandwidth consumption, I employed
LLC-cache-misses events.
> [C++] Investigate radix sort for integer arrays
> -----------------------------------------------
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
> Issue Type: Wish
> Components: C++
> Reporter: Antoine Pitrou
> Assignee: Kirill Lykov
> Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png,
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable
> sort. It may be faster to use a radix sort instead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)