js8544 commented on issue #36059:
URL: https://github.com/apache/arrow/issues/36059#issuecomment-1591516918

   I found out that we can pre-allocate buffer space for the hash table before 
inserting elements into it. This completely avoids rehashing during the 
insertion process and yields some performance improvements. On my machine I get:
   
   Before
   ```
   100
   4.49 µs ± 94.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   16.2 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   1000
   8.75 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   22.4 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   10000
   71.4 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   53.7 µs ± 937 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   100000
   3.78 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   375 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   1000000
   41.2 ms ± 929 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   17.7 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   10000000
   679 ms ± 24.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
   127 ms ± 1.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 
   ```
   
   After
   ```
   100
   3.73 µs ± 57.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   17 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   1000
   7.75 µs ± 190 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   23.2 µs ± 602 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   10000
   44.5 µs ± 751 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   54.4 µs ± 926 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
   100000
   519 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   378 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
   1000000
   38.3 ms ± 520 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   17.4 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   10000000
   557 ms ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
   124 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```
   
   The improvement is outstanding for input size <= 100000 (up to serveral 
times faster), however it won't be too much for larger inputs as the rehashing 
cost was already amortized. There is still a large performance gap during the 
`Lookup` phase that remains unsolved.
   
   Nevertheless I'll submit a PR for this improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to