Dandandan opened a new pull request #9116: URL: https://github.com/apache/arrow/pull/9116
Create hashes vectorized in hash join This is based on the open PR https://github.com/apache/arrow/pull/9070 The idea is as follows: * We still use the `HashMap` but rather than using the data as key we use the hash value ( `u64`) as key. this allows the hashmap still to grow etc, but it does a bit more on insert (an extra hash) than with a custom vectorized hashmap * Collision check (for probe side) needs to be implemented, also hash concat/merging needs to be improved, now it is just h1 + h2 + h3 etc. which makes `(1, 2)` map to the same hash value as `(2, 1)`. * Only the hash value creation is in this PR vectorized, the rest is still on a row basis. TCPH is without the remaining part ~10% faster than the PR: ~180 vs ~200ms. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
