Dandandan opened a new pull request #8832:
URL: https://github.com/apache/arrow/pull/8832


   This PR shows one area for improvement in the hash join. Currently the Vec 
is hashed twice by first looking up the key, and then inserting or mutating the 
value.
   Using the unstable `hash_raw_entry` api we can avoid this, and get some 
speedup (mostly in the hash join).
   
   We could also use the hashbrown crate instead to avoid needing a nightly 
compiler.
   
   This brings the query 12 times down from > 1500ms locally to:
   ```
   Query 12 iteration 0 took 1425 ms
   Query 12 iteration 1 took 1427 ms
   Query 12 iteration 2 took 1481 ms
   Query 12 iteration 3 took 1465 ms
   Query 12 iteration 4 took 1469 ms
   Query 12 iteration 5 took 1455 ms
   Query 12 iteration 6 took 1482 ms
   Query 12 iteration 7 took 1478 ms
   Query 12 iteration 8 took 1480 ms
   Query 12 iteration 9 took 1463 ms
   ```
   
   FYI @jorgecarleitao @andygrove 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to