js8544 commented on issue #36059: URL: https://github.com/apache/arrow/issues/36059#issuecomment-1591516918
I found out that we can pre-allocate buffer space for the hash table before inserting elements into it. This completely avoids rehashing during the insertion process and yields some performance improvements. On my machine I get: Before ``` 100 4.49 µs ± 94.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 16.2 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 1000 8.75 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 22.4 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 10000 71.4 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 53.7 µs ± 937 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 100000 3.78 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 375 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 1000000 41.2 ms ± 929 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 17.7 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 679 ms ± 24.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 127 ms ± 1.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` After ``` 100 3.73 µs ± 57.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 17 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 1000 7.75 µs ± 190 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 23.2 µs ± 602 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 10000 44.5 µs ± 751 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 54.4 µs ± 926 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) 100000 519 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 378 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 1000000 38.3 ms ± 520 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 17.4 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 557 ms ± 18.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 124 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` The improvement is outstanding for input size <= 100000 (up to serveral times faster), however it won't be too much for larger inputs as the rehashing cost was already amortized. There is still a large performance gap during the `Lookup` phase that remains unsolved. Nevertheless I'll submit a PR for this improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
