Dandandan commented on pull request #8863: URL: https://github.com/apache/arrow/pull/8863#issuecomment-740183272
As a next step after this, I think it would be interesting if we can have a look at calculating the hashes on the columns instead to benefit from the columnar data. Some material I found on this: https://www.cockroachlabs.com/blog/vectorized-hash-joiner/ (simple explanation) https://pure.uva.nl/ws/files/4321270/68049_09.pdf Please add if you know of more/newer material! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
