Rachelint commented on PR #22729: URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4618544977
> The existing challenge is that the current implementation is hard to extend and review. I want to clean things up through this refactor first, and then apply the actual change. Current one seems the refactor of original `row_hash.rs`? But in #155, code changes in `row_hash.rs` actually very few(154 lines), and main code changes are blocked version implementation of `accumulators` and `group values`, and test codes(3000+ lines). I totally agree with we should refactor code in `row_hash.rs`. It is very very messy and hard to maintain as I complained before, due to we mix the `partial` and `final` logic in a `single stream`. But seems it may not help very much to reduce codes about supporting blocked memory management? I think maybe #15591 and refactor can be pushed forward in parallel, they don't have very much in common. As said above only few codes are for supporting blocked mode in the aggr stream, large codes are for blocked version `accumulators` and `group values`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
