aocsa commented on pull request #11019: URL: https://github.com/apache/arrow/pull/11019#issuecomment-914419100
> > Some conclusions: Both implementations (partition_nth:intro_select and partition_nth:heap) show similar numbers. > > Interesting. Did you double-check that no mistake was made when running the benchmarks (i.e. are you really comparing the two different algorithms)? > My bad, You were right @pitrou the numbers are not correct. Something happenned when I ran this [bench](https://gist.github.com/aocsa/3870345fc9b4bcc28372d3b9f8e8f673) both at the same time. I ran again PartitionNthToIndices with different strategies (nth_element vs heap_based) separetly and I got these numbers: `PartitionNthToIndices_nth_element` ``` -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------- BottomKInt64/32768/10000/min_time:1.000 8558 ns 8556 ns 166021 bytes_per_second=3.56686G/s items_per_second=478.735M/s null_percent=0.01 size=32.768k BottomKInt64/32768/100/min_time:1.000 20545 ns 20542 ns 63571 bytes_per_second=1.4856G/s items_per_second=199.394M/s null_percent=1 size=32.768k BottomKInt64/32768/10/min_time:1.000 17864 ns 17862 ns 72315 bytes_per_second=1.70853G/s items_per_second=229.316M/s null_percent=10 size=32.768k BottomKInt64/32768/2/min_time:1.000 11565 ns 11564 ns 118026 bytes_per_second=2.63912G/s items_per_second=354.217M/s null_percent=50 size=32.768k BottomKInt64/32768/1/min_time:1.000 6317 ns 6316 ns 214862 bytes_per_second=4.83149G/s items_per_second=648.472M/s null_percent=100 size=32.768k BottomKInt64/32768/0/min_time:1.000 8407 ns 8406 ns 164849 bytes_per_second=3.63065G/s items_per_second=487.297M/s null_percent=0 size=32.768k BottomKInt64/1048576/100/min_time:1.000 916082 ns 915966 ns 1526 bytes_per_second=1091.74M/s items_per_second=143.097M/s null_percent=1 size=1048.58k BottomKInt64/8388608/100/min_time:1.000 7874118 ns 7871600 ns 178 bytes_per_second=1016.31M/s items_per_second=133.21M/s null_percent=1 size=8.38861M ``` `PartitionNthToIndices_heap_based` ``` -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------- BottomKInt64/32768/10000/min_time:1.000 55314 ns 55303 ns 25080 bytes_per_second=565.064M/s items_per_second=74.0641M/s null_percent=0.01 size=32.768k BottomKInt64/32768/100/min_time:1.000 57577 ns 57568 ns 23965 bytes_per_second=542.835M/s items_per_second=71.1505M/s null_percent=1 size=32.768k BottomKInt64/32768/10/min_time:1.000 59858 ns 59847 ns 22977 bytes_per_second=522.168M/s items_per_second=68.4416M/s null_percent=10 size=32.768k BottomKInt64/32768/2/min_time:1.000 54008 ns 53995 ns 24750 bytes_per_second=578.753M/s items_per_second=75.8583M/s null_percent=50 size=32.768k BottomKInt64/32768/1/min_time:1.000 5929 ns 5927 ns 235385 bytes_per_second=5.14856G/s items_per_second=691.028M/s null_percent=100 size=32.768k BottomKInt64/32768/0/min_time:1.000 55751 ns 55741 ns 24636 bytes_per_second=560.626M/s items_per_second=73.4823M/s null_percent=0 size=32.768k BottomKInt64/1048576/100/min_time:1.000 4542982 ns 4541548 ns 323 bytes_per_second=220.189M/s items_per_second=28.8606M/s null_percent=1 size=1048.58k BottomKInt64/8388608/100/min_time:1.000 53349325 ns 53333611 ns 26 bytes_per_second=149.999M/s items_per_second=19.6607M/s null_percent=1 size=8.38861M ``` > > IMO, implementation of stable algorithms need more exploration, and it could be implemented in a separate follow up JIRA issue > > Hmm, why not, but `sort_indices` and `partition_nth_indices` are guaranteed to be stable, so it would be better if topK/bottomK wasn't inconsistent. Note that [both algorithms](https://gist.github.com/aocsa/7a2094159e65c69b5e84c694498f03bd) uses `NonStablePartitioner` to separete Nulls and both are unstable. Clearly, `PartitionNthToIndices_nth_element` is faster (5-6x) but `PartitionNthToIndices` kernel is not enough to implement `TopK/BottomK` because the first `K` elements needs to be sorted. Besides in order to Implement a stable version most probably we will need to use `StablePartitioner + std::stable_partition + std::stable_sort`. However the attraction of the heap-based implementation is to have a streaming algorithm. IMO this PR should first expose a `select_k_unstable` and look in a follow up PR the best implementation for `select_k_stable`. Looking forward to your thoughts. cc @lidavidm, @pitrou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
