Re: [PR] refactor: Split hash aggregation logic into separated streams [datafusion]

via GitHub Wed, 03 Jun 2026 19:57:36 -0700


Rachelint commented on PR #22729:
URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4618544977


   > The existing challenge is that the current implementation is hard to 
extend and review. I want to clean things up through this refactor first, and 
then apply the actual change.
   
   Current one seems the refactor of original `row_hash.rs`?
   
   But in #155, code changes in `row_hash.rs` actually very few(154 lines), and 
main code changes are blocked version implementation of `accumulators` and 
`group values`, and test codes(3000+ lines).
   
   I totally agree with we should refactor code in `row_hash.rs`. It is very 
very messy and hard to maintain as I complained before, due to we mix the 
`partial` and `final` logic in a `single stream`. 
   But seems it may not help very much to reduce codes about supporting  
blocked memory management?
   
   I think maybe #15591 and refactor can be pushed forward in parallel, they 
don't have very much in common.
   As said above only few codes are for supporting blocked mode in the aggr 
stream, large codes are for blocked version `accumulators` and `group values`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor: Split hash aggregation logic into separated streams [datafusion]

Reply via email to