aocsa commented on pull request #11019:
URL: https://github.com/apache/arrow/pull/11019#issuecomment-914419100


   > > Some conclusions: Both implementations (partition_nth:intro_select and 
partition_nth:heap) show similar numbers.
   > 
   > Interesting. Did you double-check that no mistake was made when running 
the benchmarks (i.e. are you really comparing the two different algorithms)?
   > 
   
   My bad, You were right  @pitrou  the numbers are not correct.   Something 
happenned when I ran this 
[bench](https://gist.github.com/aocsa/3870345fc9b4bcc28372d3b9f8e8f673) both at 
the same time.  I ran again PartitionNthToIndices with different strategies 
(nth_element vs heap_based) separetly and I got these numbers: 
   
   `PartitionNthToIndices_nth_element`
   ```
   
--------------------------------------------------------------------------------------------------
   Benchmark                                        Time             CPU   
Iterations UserCounters...
   
--------------------------------------------------------------------------------------------------
   BottomKInt64/32768/10000/min_time:1.000       8558 ns         8556 ns       
166021 bytes_per_second=3.56686G/s items_per_second=478.735M/s 
null_percent=0.01 size=32.768k
   BottomKInt64/32768/100/min_time:1.000        20545 ns        20542 ns        
63571 bytes_per_second=1.4856G/s items_per_second=199.394M/s null_percent=1 
size=32.768k
   BottomKInt64/32768/10/min_time:1.000         17864 ns        17862 ns        
72315 bytes_per_second=1.70853G/s items_per_second=229.316M/s null_percent=10 
size=32.768k
   BottomKInt64/32768/2/min_time:1.000          11565 ns        11564 ns       
118026 bytes_per_second=2.63912G/s items_per_second=354.217M/s null_percent=50 
size=32.768k
   BottomKInt64/32768/1/min_time:1.000           6317 ns         6316 ns       
214862 bytes_per_second=4.83149G/s items_per_second=648.472M/s null_percent=100 
size=32.768k
   BottomKInt64/32768/0/min_time:1.000           8407 ns         8406 ns       
164849 bytes_per_second=3.63065G/s items_per_second=487.297M/s null_percent=0 
size=32.768k
   BottomKInt64/1048576/100/min_time:1.000     916082 ns       915966 ns        
 1526 bytes_per_second=1091.74M/s items_per_second=143.097M/s null_percent=1 
size=1048.58k
   BottomKInt64/8388608/100/min_time:1.000    7874118 ns      7871600 ns        
  178 bytes_per_second=1016.31M/s items_per_second=133.21M/s null_percent=1 
size=8.38861M
   ``` 
   `PartitionNthToIndices_heap_based`
   
   ```
   
--------------------------------------------------------------------------------------------------
   Benchmark                                        Time             CPU   
Iterations UserCounters...
   
--------------------------------------------------------------------------------------------------
   BottomKInt64/32768/10000/min_time:1.000      55314 ns        55303 ns        
25080 bytes_per_second=565.064M/s items_per_second=74.0641M/s null_percent=0.01 
size=32.768k
   BottomKInt64/32768/100/min_time:1.000        57577 ns        57568 ns        
23965 bytes_per_second=542.835M/s items_per_second=71.1505M/s null_percent=1 
size=32.768k
   BottomKInt64/32768/10/min_time:1.000         59858 ns        59847 ns        
22977 bytes_per_second=522.168M/s items_per_second=68.4416M/s null_percent=10 
size=32.768k
   BottomKInt64/32768/2/min_time:1.000          54008 ns        53995 ns        
24750 bytes_per_second=578.753M/s items_per_second=75.8583M/s null_percent=50 
size=32.768k
   BottomKInt64/32768/1/min_time:1.000           5929 ns         5927 ns       
235385 bytes_per_second=5.14856G/s items_per_second=691.028M/s null_percent=100 
size=32.768k
   BottomKInt64/32768/0/min_time:1.000          55751 ns        55741 ns        
24636 bytes_per_second=560.626M/s items_per_second=73.4823M/s null_percent=0 
size=32.768k
   BottomKInt64/1048576/100/min_time:1.000    4542982 ns      4541548 ns        
  323 bytes_per_second=220.189M/s items_per_second=28.8606M/s null_percent=1 
size=1048.58k
   BottomKInt64/8388608/100/min_time:1.000   53349325 ns     53333611 ns        
   26 bytes_per_second=149.999M/s items_per_second=19.6607M/s null_percent=1 
size=8.38861M
   ```
   > > IMO, implementation of stable algorithms need more exploration, and it 
could be implemented in a separate follow up JIRA issue
   > 
   > Hmm, why not, but `sort_indices` and `partition_nth_indices` are 
guaranteed to be stable, so it would be better if topK/bottomK wasn't 
inconsistent.
   
   Note that [both 
algorithms](https://gist.github.com/aocsa/7a2094159e65c69b5e84c694498f03bd) 
uses `NonStablePartitioner` to separete Nulls and both are unstable.
   Clearly, `PartitionNthToIndices_nth_element` is faster (5-6x) but 
`PartitionNthToIndices` kernel is not enough to implement `TopK/BottomK` 
because the first `K` elements needs to be sorted. Besides in order to 
Implement a stable version most probably we will need to use   
`StablePartitioner + std::stable_partition + std::stable_sort`.  However  the 
attraction of  the heap-based implementation  is to have a streaming algorithm. 
   
   IMO this PR should first  expose a `select_k_unstable` and look in a follow 
up PR the best implementation for `select_k_stable`. 
   Looking forward to your thoughts. cc @lidavidm, @pitrou 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to