jonathanc-n commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063872693
Another thing we can do is hash it once and use parts of the hash at a time during `RepartitionExec` and building the hashtable. This is made even better with having to do another repartition during spilling. If the optimizer repartitions inputs for a hash join we can refactor `RepartitionExec` to pass the hash values to the `HashJoinExec`. I'm aware that Velox does something like this. This would be a bit memory intensive as most of the time because it would need to compute a 64 bit hash (instead of possibly just using a `u32` hash) up front but it gets to mask off bits to avoid re-hashing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org