pitrou commented on PR #13334:
URL: https://github.com/apache/arrow/pull/13334#issuecomment-1191582140

   @ArianaVillegas Ok, I took a lot and then pushed an update. Things I added:
   * a random-generated test that results are identical when sorting a dict 
array and sorting the decoded dict array
   * some benchmarks
   
   Here are the benchmark results here:
   * on integers:
   ```
   ArraySortIndicesInt64Narrow/32768/10000      12363 ns        12360 ns        
55108 bytes_per_second=2.46911G/s items_per_second=331.399M/s null_percent=0.01 
size=32.768k
   ArraySortIndicesInt64Narrow/32768/100        12813 ns        12810 ns        
55023 bytes_per_second=2.38237G/s items_per_second=319.757M/s null_percent=1 
size=32.768k
   ArraySortIndicesInt64Narrow/32768/10         14286 ns        14282 ns        
48626 bytes_per_second=2.13672G/s items_per_second=286.786M/s null_percent=10 
size=32.768k
   ArraySortIndicesInt64Narrow/32768/2          25929 ns        25923 ns        
26800 bytes_per_second=1.17725G/s items_per_second=158.007M/s null_percent=50 
size=32.768k
   ArraySortIndicesInt64Narrow/32768/1           6451 ns         6450 ns       
102428 bytes_per_second=4.73167G/s items_per_second=635.074M/s null_percent=100 
size=32.768k
   ArraySortIndicesInt64Narrow/32768/0          12167 ns        12165 ns        
56572 bytes_per_second=2.50871G/s items_per_second=336.713M/s null_percent=0 
size=32.768k
   ArraySortIndicesInt64Narrow/1048576/100     416093 ns       416010 ns        
 1686 bytes_per_second=2.34745G/s items_per_second=315.069M/s null_percent=1 
size=1048.58k
   ArraySortIndicesInt64Narrow/8388608/100    5398365 ns      5395214 ns        
  126 bytes_per_second=1.44804G/s items_per_second=194.353M/s null_percent=1 
size=8.38861M
   ArraySortIndicesInt64Wide/32768/10000       196789 ns       196744 ns        
 3560 bytes_per_second=158.836M/s items_per_second=20.8189M/s null_percent=0.01 
size=32.768k
   ArraySortIndicesInt64Wide/32768/100         200730 ns       200687 ns        
 3475 bytes_per_second=155.715M/s items_per_second=20.4098M/s null_percent=1 
size=32.768k
   ArraySortIndicesInt64Wide/32768/10          190690 ns       190608 ns        
 3677 bytes_per_second=163.949M/s items_per_second=21.4891M/s null_percent=10 
size=32.768k
   ArraySortIndicesInt64Wide/32768/2           117346 ns       117323 ns        
 5847 bytes_per_second=266.36M/s items_per_second=34.9123M/s null_percent=50 
size=32.768k
   ArraySortIndicesInt64Wide/32768/1             6433 ns         6430 ns       
108754 bytes_per_second=4.74627G/s items_per_second=637.034M/s null_percent=100 
size=32.768k
   ArraySortIndicesInt64Wide/32768/0           195827 ns       195786 ns        
 3550 bytes_per_second=159.613M/s items_per_second=20.9208M/s null_percent=0 
size=32.768k
   ArraySortIndicesInt64Wide/1048576/100      9844567 ns      9842802 ns        
   71 bytes_per_second=101.597M/s items_per_second=13.3165M/s null_percent=1 
size=1048.58k
   ArraySortIndicesInt64Wide/8388608/100    103982502 ns    103946729 ns        
    7 bytes_per_second=76.9625M/s items_per_second=10.0876M/s null_percent=1 
size=8.38861M
   ArraySortIndicesInt64Dict/32768/10000       158179 ns       158147 ns        
 4410 bytes_per_second=197.601M/s items_per_second=25.9M/s null_percent=0.01 
size=32.768k
   ArraySortIndicesInt64Dict/32768/100         200484 ns       200443 ns        
 3480 bytes_per_second=155.905M/s items_per_second=20.4348M/s null_percent=1 
size=32.768k
   ArraySortIndicesInt64Dict/32768/10          175254 ns       175219 ns        
 3964 bytes_per_second=178.348M/s items_per_second=23.3764M/s null_percent=10 
size=32.768k
   ArraySortIndicesInt64Dict/32768/2           117257 ns       117231 ns        
 5850 bytes_per_second=266.567M/s items_per_second=34.9395M/s null_percent=50 
size=32.768k
   ArraySortIndicesInt64Dict/32768/1            72865 ns        72848 ns        
 9395 bytes_per_second=428.973M/s items_per_second=56.2263M/s null_percent=100 
size=32.768k
   ArraySortIndicesInt64Dict/32768/0           155398 ns       155327 ns        
 4417 bytes_per_second=201.188M/s items_per_second=26.3702M/s null_percent=0 
size=32.768k
   ArraySortIndicesInt64Dict/1048576/100      7351865 ns      7350538 ns        
   95 bytes_per_second=136.044M/s items_per_second=17.8316M/s null_percent=1 
size=1048.58k
   ArraySortIndicesInt64Dict/8388608/100     68320713 ns     68298178 ns        
   10 bytes_per_second=117.133M/s items_per_second=15.3529M/s null_percent=1 
size=8.38861M
   ```
   * on strings:
   ```
   ArraySortIndicesStrings/32768/10000         392124 ns       387950 ns        
 1797 bytes_per_second=80.5516M/s items_per_second=10.5581M/s null_percent=0.01 
size=32.768k
   ArraySortIndicesStrings/32768/100           381836 ns       381706 ns        
 1771 bytes_per_second=81.8694M/s items_per_second=10.7308M/s null_percent=1 
size=32.768k
   ArraySortIndicesStrings/32768/10            361832 ns       361370 ns        
 1989 bytes_per_second=86.4765M/s items_per_second=11.3347M/s null_percent=10 
size=32.768k
   ArraySortIndicesStrings/32768/2             199055 ns       198622 ns        
 3482 bytes_per_second=157.334M/s items_per_second=20.6221M/s null_percent=50 
size=32.768k
   ArraySortIndicesStrings/32768/1               6437 ns         6422 ns       
108824 bytes_per_second=4.75177G/s items_per_second=637.772M/s null_percent=100 
size=32.768k
   ArraySortIndicesStrings/32768/0             373233 ns       372412 ns        
 1863 bytes_per_second=83.9124M/s items_per_second=10.9986M/s null_percent=0 
size=32.768k
   ArraySortIndicesStrings/1048576/100       17573012 ns     17544212 ns        
   40 bytes_per_second=56.9989M/s items_per_second=7.47095M/s null_percent=1 
size=1048.58k
   ArraySortIndicesStrings/8388608/100      249132754 ns    248608949 ns        
    3 bytes_per_second=32.1791M/s items_per_second=4.21777M/s null_percent=1 
size=8.38861M
   ArraySortIndicesStringsDict/32768/10000     198371 ns       198096 ns        
 3481 bytes_per_second=157.752M/s items_per_second=20.6768M/s null_percent=0.01 
size=32.768k
   ArraySortIndicesStringsDict/32768/100       193011 ns       192768 ns        
 3604 bytes_per_second=162.112M/s items_per_second=21.2484M/s null_percent=1 
size=32.768k
   ArraySortIndicesStringsDict/32768/10        169405 ns       169185 ns        
 4044 bytes_per_second=184.709M/s items_per_second=24.2101M/s null_percent=10 
size=32.768k
   ArraySortIndicesStringsDict/32768/2         114190 ns       113755 ns        
 6003 bytes_per_second=274.713M/s items_per_second=36.0072M/s null_percent=50 
size=32.768k
   ArraySortIndicesStringsDict/32768/1          72698 ns        72240 ns        
 9458 bytes_per_second=432.583M/s items_per_second=56.6995M/s null_percent=100 
size=32.768k
   ArraySortIndicesStringsDict/32768/0         152237 ns       151268 ns        
 4589 bytes_per_second=206.588M/s items_per_second=27.0779M/s null_percent=0 
size=32.768k
   ArraySortIndicesStringsDict/1048576/100    7357322 ns      7354462 ns        
   91 bytes_per_second=135.972M/s items_per_second=17.8221M/s null_percent=1 
size=1048.58k
   ArraySortIndicesStringsDict/8388608/100   66893205 ns     66862644 ns        
    9 bytes_per_second=119.648M/s items_per_second=15.6825M/s null_percent=1 
size=8.38861M
   ```
   
   We can see that there is a speed increase on a dict of strings (compared to 
a plain strings array), but not on a dict of integers. This is already nice, 
I'll try to see if there's a way to be better still.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to