sunchao commented on issue #4973: URL: https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1613496266
> I did some further experiment in the bucketing to see if this is still the case today [branch](https://github.com/apache/arrow-datafusion/commits/bucketing). This is very close to the description in the above MonetDB paper and written about here https://www.cockroachlabs.com/blog/vectorized-hash-joiner/ but doesn't really improve join performance, even when creating buckets to avoid collisions @Dandandan does DF use SIMD in the hot paths of hash join? From my experiences LLVM auto-vectorization is pretty fragile and often times doesn't get triggered if the logic is a bit complex. I think even the `create_hashes` in aggregation doesn't use SIMD (I could be wrong there). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
