alamb commented on issue #2723: URL: https://github.com/apache/arrow-datafusion/issues/2723#issuecomment-1325109735
> Sure, but it reduces dyn dispatch by a lot (once per batch instead once per group), removes the take kernel and the duplication can be hidden by careful macros/generics. I understand your point. I would probably have to see a prototype to really understand how complicated it would be in practice. It doesn't feel right to me . Another thing to consider is other potential aggregation algorithms: 1. Externalization (how would aggregator state be dumped / read into external files)? Ideally this wouldn't have to be implemented and tested per algorithm 2. GroupBy Merge (where the data is sorted by group keys, so all values for each group are contiguous in the input) -- this is sometimes used as part of externalized group by hash (to avoid rehashing inputs) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
