Dandandan opened a new pull request #8832: URL: https://github.com/apache/arrow/pull/8832
This PR shows one area for improvement in the hash join. Currently the Vec is hashed twice by first looking up the key, and then inserting or mutating the value. Using the unstable `hash_raw_entry` api we can avoid this, and get some speedup (mostly in the hash join). We could also use the hashbrown crate instead to avoid needing a nightly compiler. This brings the query 12 times down from > 1500ms locally to: ``` Query 12 iteration 0 took 1425 ms Query 12 iteration 1 took 1427 ms Query 12 iteration 2 took 1481 ms Query 12 iteration 3 took 1465 ms Query 12 iteration 4 took 1469 ms Query 12 iteration 5 took 1455 ms Query 12 iteration 6 took 1482 ms Query 12 iteration 7 took 1478 ms Query 12 iteration 8 took 1480 ms Query 12 iteration 9 took 1463 ms ``` FYI @jorgecarleitao @andygrove ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
