[GitHub] [arrow-datafusion] sunchao commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregation

via GitHub Thu, 29 Jun 2023 09:28:19 -0700


sunchao commented on issue #4973:
URL: 
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1613496266


   > I did some further experiment in the bucketing to see if this is still the 
case today 
[branch](https://github.com/apache/arrow-datafusion/commits/bucketing).
   This is very close to the description in the above MonetDB paper and written 
about here https://www.cockroachlabs.com/blog/vectorized-hash-joiner/ but 
doesn't really improve join performance, even when creating buckets to avoid 
collisions
   
   @Dandandan does DF use SIMD in the hot paths of hash join? From my 
experiences LLVM auto-vectorization is pretty fragile and often times doesn't 
get triggered if the logic is a bit complex. I think even the `create_hashes` 
in aggregation doesn't use SIMD (I could be wrong there).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] sunchao commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregation

Reply via email to