aocsa commented on pull request #11019:
URL: https://github.com/apache/arrow/pull/11019#issuecomment-911860468


   Thanks @lidavidm, @edponce  for the feedback.  I reduced the code, 
refactoring it where it was required.   
   
   > You may want to rebase to fix some of the builds.
   > 
   > I think you've mentioned benchmarks, are there results to show/is there 
benchmark code?
   > 
   
   Here are some benchmark ([code is 
here](https://gist.github.com/aocsa/3870345fc9b4bcc28372d3b9f8e8f673)) results 
for [`partition_nth:intro_select` and 
`partition_nth:heap`,](https://gist.github.com/aocsa/aca0d8c1863576d6652473062b362295)
 values are int64 scalars with length  from 1 << 20 to 1 << 23 and with 100 of 
`null_proportion` , `k` is `0,125*length`: 
   
   `partition_nth:intro_select`
   ```
   NthToIndicesInt64/SelectTop1024/32768/10000/min_time:1.000        336418 ns  
     336337 ns         4134 bytes_per_second=92.9128M/s 
items_per_second=12.1783M/s null_percent=0.01 size=32.768k
   NthToIndicesInt64/SelectTop1024/32768/100/min_time:1.000          431855 ns  
     431747 ns         3243 bytes_per_second=72.3804M/s 
items_per_second=9.48705M/s null_percent=1 size=32.768k
   NthToIndicesInt64/SelectTop1024/32768/10/min_time:1.000           413660 ns  
     413553 ns         3380 bytes_per_second=75.5646M/s 
items_per_second=9.9044M/s null_percent=10 size=32.768k
   NthToIndicesInt64/SelectTop1024/32768/2/min_time:1.000            236539 ns  
     236480 ns         5940 bytes_per_second=132.147M/s 
items_per_second=17.3207M/s null_percent=50 size=32.768k
   NthToIndicesInt64/SelectTop1024/32768/1/min_time:1.000             77369 ns  
      77349 ns        18102 bytes_per_second=404.011M/s 
items_per_second=52.9545M/s null_percent=100 size=32.768k
   NthToIndicesInt64/SelectTop1024/32768/0/min_time:1.000            336092 ns  
     336009 ns         4160 bytes_per_second=93.0033M/s 
items_per_second=12.1901M/s null_percent=0 size=32.768k
   NthToIndicesInt64/SelectTop1024/1048576/100/min_time:1.000       7809018 ns  
    7807367 ns          179 bytes_per_second=128.084M/s 
items_per_second=16.7882M/s null_percent=1 size=1048.58k
   NthToIndicesInt64/SelectTop1024/8388608/100/min_time:1.000      61132755 ns  
   61121366 ns           23 bytes_per_second=130.887M/s 
items_per_second=17.1556M/s null_percent=1 size=8.38861M
   
   NthToIndicesInt64/SelectTop64/32768/10000/min_time:1.000          244987 ns  
     244925 ns         5716 bytes_per_second=127.59M/s 
items_per_second=16.7235M/s null_percent=0.01 size=32.768k
   NthToIndicesInt64/SelectTop64/32768/100/min_time:1.000            367908 ns  
     367813 ns         3807 bytes_per_second=84.9616M/s 
items_per_second=11.1361M/s null_percent=1 size=32.768k
   NthToIndicesInt64/SelectTop64/32768/10/min_time:1.000             324431 ns  
     324349 ns         4354 bytes_per_second=96.3469M/s 
items_per_second=12.6284M/s null_percent=10 size=32.768k
   NthToIndicesInt64/SelectTop64/32768/2/min_time:1.000              220053 ns  
     219995 ns         6377 bytes_per_second=142.049M/s 
items_per_second=18.6186M/s null_percent=50 size=32.768k
   NthToIndicesInt64/SelectTop64/32768/1/min_time:1.000               80327 ns  
      80304 ns        17677 bytes_per_second=389.148M/s 
items_per_second=51.0064M/s null_percent=100 size=32.768k
   NthToIndicesInt64/SelectTop64/32768/0/min_time:1.000              247086 ns  
     246991 ns         5561 bytes_per_second=126.523M/s 
items_per_second=16.5836M/s null_percent=0 size=32.768k
   NthToIndicesInt64/SelectTop64/1048576/100/min_time:1.000         7572595 ns  
    7570022 ns          185 bytes_per_second=132.1M/s 
items_per_second=17.3146M/s null_percent=1 size=1048.58k
   NthToIndicesInt64/SelectTop64/8388608/100/min_time:1.000        60670175 ns  
   60652316 ns           23 bytes_per_second=131.899M/s 
items_per_second=17.2883M/s null_percent=1 size=8.38861M
   ```
   
   `partition_nth:heap`
   ```
   
   
   NthToIndicesInt64/HeapBasedTop1024/32768/10000/min_time:1.000     333859 ns  
     333737 ns         4180 bytes_per_second=93.6365M/s 
items_per_second=12.2731M/s null_percent=0.01 size=32.768k
   NthToIndicesInt64/HeapBasedTop1024/32768/100/min_time:1.000       433951 ns  
     433841 ns         3261 bytes_per_second=72.031M/s 
items_per_second=9.44125M/s null_percent=1 size=32.768k
   NthToIndicesInt64/HeapBasedTop1024/32768/10/min_time:1.000        414340 ns  
     414237 ns         3390 bytes_per_second=75.4399M/s 
items_per_second=9.88806M/s null_percent=10 size=32.768k
   NthToIndicesInt64/HeapBasedTop1024/32768/2/min_time:1.000         238940 ns  
     238870 ns         5861 bytes_per_second=130.824M/s 
items_per_second=17.1474M/s null_percent=50 size=32.768k
   NthToIndicesInt64/HeapBasedTop1024/32768/1/min_time:1.000          79213 ns  
      79190 ns        17821 bytes_per_second=394.622M/s 
items_per_second=51.7238M/s null_percent=100 size=32.768k
   NthToIndicesInt64/HeapBasedTop1024/32768/0/min_time:1.000         337863 ns  
     337777 ns         4125 bytes_per_second=92.5167M/s 
items_per_second=12.1263M/s null_percent=0 size=32.768k
   NthToIndicesInt64/HeapBasedTop1024/1048576/100/min_time:1.000    7984756 ns  
    7982600 ns          177 bytes_per_second=125.272M/s 
items_per_second=16.4197M/s null_percent=1 size=1048.58k
   NthToIndicesInt64/HeapBasedTop1024/8388608/100/min_time:1.000   61850326 ns  
   61825764 ns           22 bytes_per_second=129.396M/s 
items_per_second=16.9602M/s null_percent=1 size=8.38861M
   
   NthToIndicesInt64/HeapBasedTop64/32768/10000/min_time:1.000       246324 ns  
     246264 ns         5648 bytes_per_second=126.896M/s 
items_per_second=16.6326M/s null_percent=0.01 size=32.768k
   NthToIndicesInt64/HeapBasedTop64/32768/100/min_time:1.000         369914 ns  
     369818 ns         3778 bytes_per_second=84.5011M/s 
items_per_second=11.0757M/s null_percent=1 size=32.768k
   NthToIndicesInt64/HeapBasedTop64/32768/10/min_time:1.000          323143 ns  
     323059 ns         4286 bytes_per_second=96.7316M/s 
items_per_second=12.6788M/s null_percent=10 size=32.768k
   NthToIndicesInt64/HeapBasedTop64/32768/2/min_time:1.000           229702 ns  
     229619 ns         6399 bytes_per_second=136.095M/s 
items_per_second=17.8383M/s null_percent=50 size=32.768k
   NthToIndicesInt64/HeapBasedTop64/32768/1/min_time:1.000            80524 ns  
      80503 ns        17146 bytes_per_second=388.186M/s 
items_per_second=50.8804M/s null_percent=100 size=32.768k
   NthToIndicesInt64/HeapBasedTop64/32768/0/min_time:1.000           248449 ns  
     248391 ns         5579 bytes_per_second=125.81M/s 
items_per_second=16.4902M/s null_percent=0 size=32.768k
   NthToIndicesInt64/HeapBasedTop64/1048576/100/min_time:1.000      8001923 ns  
    7998633 ns          181 bytes_per_second=125.021M/s 
items_per_second=16.3868M/s null_percent=1 size=1048.58k
   NthToIndicesInt64/HeapBasedTop64/8388608/100/min_time:1.000     61181748 ns  
   61162773 ns           23 bytes_per_second=130.799M/s 
items_per_second=17.144M/s null_percent=1 size=8.38861M
   
   ```
   Some conclusions: Both implementations (`partition_nth:intro_select` and 
`partition_nth:heap`) show similar numbers.  PartitionNthIndices uses 
std::nth_element which uses internally a special algorithm called 
[introselect](https://github.com/tupipa/sva_freebsd90/blob/640747eb65a5723250fefc612de48694570292d1/contrib/libstdc%2B%2B/include/bits/stl_algo.h#L3945),
  this is a  is a selection algorithm that is a hybrid of quickselect and heap 
selection.  
   
   This PR started  as a way to speed top-k queries, 
https://lemire.me/blog/2017/06/21/top-speed-for-top-k-queries/ using 
`FastPriorityQueue-KWillets-replaceTop` approach.  However C++ implementation 
for std::nth_element  has the best of quickselect and heap selection as an 
hybrid solution.
   
   In the next  benchmarks I will show the peformance when the algorithm needs 
to be stable to enable `keep=first` and `keep=last`. I will explore ways to 
implement a stable binary heap to do that and compare with  {std::nth_element + 
std::stable_sort}. 
   
   --
   <br class="Apple-interchange-newline">
   
   > Should SelectK instead be an aggregate function?
   
   I am not sure, I am new with arrow. And this function looks like pretty 
similar to the APIs of SortIndices and PartitionNthIndices So I think it is in 
the right place.  
   
   Looking forward to your thoughts. cc @lidavidm 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to