[
https://issues.apache.org/jira/browse/ARROW-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493038#comment-17493038
]
Alvin Chunga Mamani commented on ARROW-14183:
---------------------------------------------
The same optimization used in sort of tables was applied. The old
implementation that used a fixed heap of size K was changed by doing a sort by
batches, merges and selecting the first K indices, but there is no change in
terms of improvement.
!image-2022-02-16-02-35-42-818.png!
What could be optimized for an implementation for small K using only a merge of
the first K elements per block, and by having a K=10 for example at the time of
merging, only the first 10 elements of each block would be merged. And for a
slightly larger K, apply the normal merge of the entire block until the total
size of the block is less than K, then it would proceed to only mege K elements.
> [C++] Improve select_k_unstable performance
> -------------------------------------------
>
> Key: ARROW-14183
> URL: https://issues.apache.org/jira/browse/ARROW-14183
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: David Li
> Assignee: Alvin Chunga Mamani
> Priority: Major
> Labels: kernel, pull-request-available
> Attachments: image-2022-02-16-02-35-42-818.png
>
> Time Spent: 4h 40m
> Remaining Estimate: 0h
>
> Recently {{sort_indices}} was optimized and refactored in ARROW-14165. One of
> those optimizations could also be applied to {{select_k_unstable}}'s
> implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)