Dandandan commented on issue #839:
URL: 
https://github.com/apache/arrow-datafusion/issues/839#issuecomment-894918837


   You are right that the current hash aggregate is quite a bit slower in this 
case than it should be.
   
   There is some work already by @alamb to make the hash aggregate faster for 
smaller keys and already gives a ~2x speedup on a tougher query.
   https://github.com/apache/arrow-datafusion/issues/790
   
   I don't think the slow code is in the code you quoted, the `take` is only 
done once for each input array. The slower part just below though works on each 
new input key + input array and does e.g. `slice` on it which has a high 
overhead because of that.
   There are some ideas linked in the issue to deal with that.
   
   There are currently also some other parts in the code that are even 
contributing more to the runtime, such as materializing the end keys/states to 
an array.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to