[GitHub] [arrow-datafusion] alamb commented on issue #2723: Consolidate GroupByHash implementations `row_hash.rs` and `hash.rs` (remove duplication)

GitBox Wed, 23 Nov 2022 05:58:49 -0800


alamb commented on issue #2723:
URL: 
https://github.com/apache/arrow-datafusion/issues/2723#issuecomment-1325109735


   > Sure, but it reduces dyn dispatch by a lot (once per batch instead once 
per group), removes the take kernel and the duplication can be hidden by 
careful macros/generics.
   
   I understand your point. I would probably have to see a prototype to really 
understand how complicated it would be in practice. It doesn't feel right to me 
.
   
   Another thing to consider is other potential aggregation algorithms:
   1. Externalization (how would aggregator state be dumped / read into 
external files)? Ideally this wouldn't have to be implemented and tested per 
algorithm
   2. GroupBy Merge (where the data is sorted by group keys, so all values for 
each group are contiguous in the input) -- this is sometimes used as part of 
externalized group by hash (to avoid rehashing inputs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #2723: Consolidate GroupByHash implementations `row_hash.rs` and `hash.rs` (remove duplication)

Reply via email to