aocsa commented on pull request #11019: URL: https://github.com/apache/arrow/pull/11019#issuecomment-911860468
Thanks @lidavidm, @edponce for the feedback. I reduced the code, refactoring it where it was required. > You may want to rebase to fix some of the builds. > > I think you've mentioned benchmarks, are there results to show/is there benchmark code? > Here are some benchmark ([code is here](https://gist.github.com/aocsa/3870345fc9b4bcc28372d3b9f8e8f673)) results for [`partition_nth:intro_select` and `partition_nth:heap`,](https://gist.github.com/aocsa/aca0d8c1863576d6652473062b362295) values are int64 scalars with length from 1 << 20 to 1 << 23 and with 100 of `null_proportion` , `k` is `0,125*length`: `partition_nth:intro_select` ``` NthToIndicesInt64/SelectTop1024/32768/10000/min_time:1.000 336418 ns 336337 ns 4134 bytes_per_second=92.9128M/s items_per_second=12.1783M/s null_percent=0.01 size=32.768k NthToIndicesInt64/SelectTop1024/32768/100/min_time:1.000 431855 ns 431747 ns 3243 bytes_per_second=72.3804M/s items_per_second=9.48705M/s null_percent=1 size=32.768k NthToIndicesInt64/SelectTop1024/32768/10/min_time:1.000 413660 ns 413553 ns 3380 bytes_per_second=75.5646M/s items_per_second=9.9044M/s null_percent=10 size=32.768k NthToIndicesInt64/SelectTop1024/32768/2/min_time:1.000 236539 ns 236480 ns 5940 bytes_per_second=132.147M/s items_per_second=17.3207M/s null_percent=50 size=32.768k NthToIndicesInt64/SelectTop1024/32768/1/min_time:1.000 77369 ns 77349 ns 18102 bytes_per_second=404.011M/s items_per_second=52.9545M/s null_percent=100 size=32.768k NthToIndicesInt64/SelectTop1024/32768/0/min_time:1.000 336092 ns 336009 ns 4160 bytes_per_second=93.0033M/s items_per_second=12.1901M/s null_percent=0 size=32.768k NthToIndicesInt64/SelectTop1024/1048576/100/min_time:1.000 7809018 ns 7807367 ns 179 bytes_per_second=128.084M/s items_per_second=16.7882M/s null_percent=1 size=1048.58k NthToIndicesInt64/SelectTop1024/8388608/100/min_time:1.000 61132755 ns 61121366 ns 23 bytes_per_second=130.887M/s items_per_second=17.1556M/s null_percent=1 size=8.38861M NthToIndicesInt64/SelectTop64/32768/10000/min_time:1.000 244987 ns 244925 ns 5716 bytes_per_second=127.59M/s items_per_second=16.7235M/s null_percent=0.01 size=32.768k NthToIndicesInt64/SelectTop64/32768/100/min_time:1.000 367908 ns 367813 ns 3807 bytes_per_second=84.9616M/s items_per_second=11.1361M/s null_percent=1 size=32.768k NthToIndicesInt64/SelectTop64/32768/10/min_time:1.000 324431 ns 324349 ns 4354 bytes_per_second=96.3469M/s items_per_second=12.6284M/s null_percent=10 size=32.768k NthToIndicesInt64/SelectTop64/32768/2/min_time:1.000 220053 ns 219995 ns 6377 bytes_per_second=142.049M/s items_per_second=18.6186M/s null_percent=50 size=32.768k NthToIndicesInt64/SelectTop64/32768/1/min_time:1.000 80327 ns 80304 ns 17677 bytes_per_second=389.148M/s items_per_second=51.0064M/s null_percent=100 size=32.768k NthToIndicesInt64/SelectTop64/32768/0/min_time:1.000 247086 ns 246991 ns 5561 bytes_per_second=126.523M/s items_per_second=16.5836M/s null_percent=0 size=32.768k NthToIndicesInt64/SelectTop64/1048576/100/min_time:1.000 7572595 ns 7570022 ns 185 bytes_per_second=132.1M/s items_per_second=17.3146M/s null_percent=1 size=1048.58k NthToIndicesInt64/SelectTop64/8388608/100/min_time:1.000 60670175 ns 60652316 ns 23 bytes_per_second=131.899M/s items_per_second=17.2883M/s null_percent=1 size=8.38861M ``` `partition_nth:heap` ``` NthToIndicesInt64/HeapBasedTop1024/32768/10000/min_time:1.000 333859 ns 333737 ns 4180 bytes_per_second=93.6365M/s items_per_second=12.2731M/s null_percent=0.01 size=32.768k NthToIndicesInt64/HeapBasedTop1024/32768/100/min_time:1.000 433951 ns 433841 ns 3261 bytes_per_second=72.031M/s items_per_second=9.44125M/s null_percent=1 size=32.768k NthToIndicesInt64/HeapBasedTop1024/32768/10/min_time:1.000 414340 ns 414237 ns 3390 bytes_per_second=75.4399M/s items_per_second=9.88806M/s null_percent=10 size=32.768k NthToIndicesInt64/HeapBasedTop1024/32768/2/min_time:1.000 238940 ns 238870 ns 5861 bytes_per_second=130.824M/s items_per_second=17.1474M/s null_percent=50 size=32.768k NthToIndicesInt64/HeapBasedTop1024/32768/1/min_time:1.000 79213 ns 79190 ns 17821 bytes_per_second=394.622M/s items_per_second=51.7238M/s null_percent=100 size=32.768k NthToIndicesInt64/HeapBasedTop1024/32768/0/min_time:1.000 337863 ns 337777 ns 4125 bytes_per_second=92.5167M/s items_per_second=12.1263M/s null_percent=0 size=32.768k NthToIndicesInt64/HeapBasedTop1024/1048576/100/min_time:1.000 7984756 ns 7982600 ns 177 bytes_per_second=125.272M/s items_per_second=16.4197M/s null_percent=1 size=1048.58k NthToIndicesInt64/HeapBasedTop1024/8388608/100/min_time:1.000 61850326 ns 61825764 ns 22 bytes_per_second=129.396M/s items_per_second=16.9602M/s null_percent=1 size=8.38861M NthToIndicesInt64/HeapBasedTop64/32768/10000/min_time:1.000 246324 ns 246264 ns 5648 bytes_per_second=126.896M/s items_per_second=16.6326M/s null_percent=0.01 size=32.768k NthToIndicesInt64/HeapBasedTop64/32768/100/min_time:1.000 369914 ns 369818 ns 3778 bytes_per_second=84.5011M/s items_per_second=11.0757M/s null_percent=1 size=32.768k NthToIndicesInt64/HeapBasedTop64/32768/10/min_time:1.000 323143 ns 323059 ns 4286 bytes_per_second=96.7316M/s items_per_second=12.6788M/s null_percent=10 size=32.768k NthToIndicesInt64/HeapBasedTop64/32768/2/min_time:1.000 229702 ns 229619 ns 6399 bytes_per_second=136.095M/s items_per_second=17.8383M/s null_percent=50 size=32.768k NthToIndicesInt64/HeapBasedTop64/32768/1/min_time:1.000 80524 ns 80503 ns 17146 bytes_per_second=388.186M/s items_per_second=50.8804M/s null_percent=100 size=32.768k NthToIndicesInt64/HeapBasedTop64/32768/0/min_time:1.000 248449 ns 248391 ns 5579 bytes_per_second=125.81M/s items_per_second=16.4902M/s null_percent=0 size=32.768k NthToIndicesInt64/HeapBasedTop64/1048576/100/min_time:1.000 8001923 ns 7998633 ns 181 bytes_per_second=125.021M/s items_per_second=16.3868M/s null_percent=1 size=1048.58k NthToIndicesInt64/HeapBasedTop64/8388608/100/min_time:1.000 61181748 ns 61162773 ns 23 bytes_per_second=130.799M/s items_per_second=17.144M/s null_percent=1 size=8.38861M ``` Some conclusions: Both implementations (`partition_nth:intro_select` and `partition_nth:heap`) show similar numbers. PartitionNthIndices uses std::nth_element which uses internally a special algorithm called [introselect](https://github.com/tupipa/sva_freebsd90/blob/640747eb65a5723250fefc612de48694570292d1/contrib/libstdc%2B%2B/include/bits/stl_algo.h#L3945), this is a is a selection algorithm that is a hybrid of quickselect and heap selection. This PR started as a way to speed top-k queries, https://lemire.me/blog/2017/06/21/top-speed-for-top-k-queries/ using `FastPriorityQueue-KWillets-replaceTop` approach. However C++ implementation for std::nth_element has the best of quickselect and heap selection as an hybrid solution. In the next benchmarks I will show the peformance when the algorithm needs to be stable to enable `keep=first` and `keep=last`. I will explore ways to implement a stable binary heap to do that and compare with {std::nth_element + std::stable_sort}. -- <br class="Apple-interchange-newline"> > Should SelectK instead be an aggregate function? I am not sure, I am new with arrow. And this function looks like pretty similar to the APIs of SortIndices and PartitionNthIndices So I think it is in the right place. Looking forward to your thoughts. cc @lidavidm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
