[GitHub] [arrow-datafusion] sunchao commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregaton

via GitHub Wed, 08 Mar 2023 14:15:38 -0800


sunchao commented on issue #4973:
URL: 
https://github.com/apache/arrow-datafusion/issues/4973#issuecomment-1460948372


   I'm actually working on some POC to improve the hash aggregation 
performance, following a very similar approach. The only difference is that I'm 
not using `Rows` in the `update_batch` API, but rather the row format defined 
in DF: it seems the `Rows` in `arrow-rs` incurs extra costs because it is 
designed for sort and requires order preserving, and the cost is especially 
high for dictionary encoded arrays.
   
   The approach requires quite a few API changes. I was able to see a big 
improvement for simple cases at least - haven't done comprehensive benchmarks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] sunchao commented on issue #4973: Improve the performance of `Aggregator`, grouping, aggregaton

Reply via email to