[
https://issues.apache.org/jira/browse/ARROW-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510085#comment-17510085
]
Alvin Chunga Mamani commented on ARROW-14183:
---------------------------------------------
In [#12582|https://github.com/apache/arrow/pull/12582] was implemented a
RadixSort per batches(taking advantage of contiguous elements in memory as same
of table-sort) + Merge K first Elements, and the results seems to be equal to
the current implementation of use a Heap (just a ~5% more fast in some cases).
With this implementation the RadixSort takes most of the time because it sort
in batches the whole elements, instead of that I can implement just a Merge K
first elements and for small values of K against thousands/millions of
elements, but maybe it will be same complexity of the Heap, because it will
sort N logK until K < length(current_block)
CC: [~lidavidm] [~apitrou]
> [C++] Improve select_k_unstable performance
> -------------------------------------------
>
> Key: ARROW-14183
> URL: https://issues.apache.org/jira/browse/ARROW-14183
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: David Li
> Assignee: Alvin Chunga Mamani
> Priority: Major
> Labels: kernel, pull-request-available
> Attachments: image-2022-02-16-02-35-42-818.png
>
> Time Spent: 7h 10m
> Remaining Estimate: 0h
>
> Recently {{sort_indices}} was optimized and refactored in ARROW-14165. One of
> those optimizations could also be applied to {{select_k_unstable}}'s
> implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)