Dandandan opened a new pull request #9116:
URL: https://github.com/apache/arrow/pull/9116


   Create hashes vectorized in hash join
   
   This is based on the open PR https://github.com/apache/arrow/pull/9070
   
   The idea is as follows:
   
   * We still use the `HashMap` but rather than using the data as key we use 
the hash value ( `u64`) as key. this allows the hashmap still to grow etc, but 
it does a bit more on insert (an extra hash) than with a custom vectorized 
hashmap
   * Collision check (for probe side) needs to be implemented, also hash 
concat/merging needs to be improved, now it is just h1 + h2 + h3 etc. which 
makes `(1, 2)` map to the same hash value as `(2, 1)`.
   * Only the hash value creation is in this PR vectorized, the rest is still 
on a row basis.
   
   TCPH is without the remaining part ~10% faster than the PR: ~180 vs ~200ms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to