alchemist51 commented on issue #19649: URL: https://github.com/apache/datafusion/issues/19649#issuecomment-3731953824
Thanks @2010YOUY01 @alamb @adriangb @Rachelint for the feedbacks! Well given the complexity in the current `GroupedHashAggregateStream`, I think it might make sense to have another Stream which have a blocked approach. While I was diving deep into the code, I realised it was becoming trickier to handle the edge cases with respect to spill, different accumulators along with the exist approach. I'm a bit concerned about the diverging factor with introducing a code piece which won't be the first choice for a while. I think we will need to discuss about the proposal a bit more in the sense of how much coverage it should have initially and the order of priority (group by, multi group by, accumulators, data type, spill support, etc) so that we pick the right order and open opportunity for the community to drive it. The real benefits of the feature start showing off when we move to limited memory environments with high cardinality or accumulators operating on bigger columns(StringView for example) which will need significant effort but will make the system much more memory efficient. Another angle I have is that we are gonna keep innovating in the aggregation space and we need to make it bare metal so that new features also start onboarding on it so that we don't diverge too much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
