[GitHub] [arrow-datafusion] Dandandan commented on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

GitBox Mon, 31 May 2021 08:14:52 -0700


Dandandan commented on issue #418:
URL: 
https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851552960



   @jorgecarleitao 
   
   Interesting!
   I did some earlier experiments with the vectorized hashing too (and saw 
similar speed ups for low-cardinality aggregates), but got a bit stuck in 
making the hash collision performant enough to not slow it down.
   
   I am not sure where the hash collision detection happens in the experimental 
branch yet - what if two groups map to the same `u64`  hash? It is quite hard 
to trigger that without writing some unit tests - I also got all the test 
passing without implementing something for collisions. A trivial way to detect 
collisions while not changing the code rigorously (comparing `GroupByValues` 
while inserting) made it perform slower than the original implementation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

Reply via email to