Dandandan edited a comment on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888991616


   > Nice write-up and very interesting discussions!
   > 
   > * By feeding the signature as a key to the `HashMap`, are we not hashing 
the original key twice? I guess this can easily be solved by setting the 
identity function instead of the default hasher on the `HashMap`  😃
   
   Yes, that's also what we currently do for the hash join algorithm. It's a 
small performance win. It also avoids the higher re-hashing cost when growing 
the hashmap.
   The cost of hashing `u64` was already way smaller though than having a 
complex nested key.
   
   I believe a hashmap could also implemented manually using a `Vec` and a 
number of buckets, when I tried it was slower, I think as the HashMap itselfs 
is quite fast for collision checks (maybe when speeding them up / vectorizing 
that part I will try this again).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to