[ 
https://issues.apache.org/jira/browse/ARROW-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493038#comment-17493038
 ] 

Alvin Chunga Mamani commented on ARROW-14183:
---------------------------------------------

The same optimization used in sort of tables was applied. The old 
implementation that used a fixed heap of size K was changed by doing a sort by 
batches, merges and selecting the first K indices, but there is no change in 
terms of improvement.

!image-2022-02-16-02-35-42-818.png!
What could be optimized for an implementation for small K using only a merge of 
the first K elements per block, and by having a K=10 for example at the time of 
merging, only the first 10 elements of each block would be merged. And for a 
slightly larger K, apply the normal merge of the entire block until the total 
size of the block is less than K, then it would proceed to only mege K elements.

> [C++] Improve select_k_unstable performance
> -------------------------------------------
>
>                 Key: ARROW-14183
>                 URL: https://issues.apache.org/jira/browse/ARROW-14183
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Assignee: Alvin Chunga Mamani
>            Priority: Major
>              Labels: kernel, pull-request-available
>         Attachments: image-2022-02-16-02-35-42-818.png
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Recently {{sort_indices}} was optimized and refactored in ARROW-14165. One of 
> those optimizations could also be applied to {{select_k_unstable}}'s 
> implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to