Dandandan edited a comment on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851552960
@jorgecarleitao Interesting! I did some earlier experiments with the vectorized hashing too (and saw similar speed ups for low-cardinality aggregates), but got a bit stuck in making the hash collision performant enough to not slow it down. I am not sure where the hash collision detection happens in the experimental branch yet - what if two values map to the same `u64` hash? It is quite hard to trigger that without writing some unit tests - I also got all the test passing without implementing something for collisions. A trivial way to detect collisions while not changing the code rigorously (comparing `GroupByValues` while inserting) made it perform slower than the original implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
