alchemist51 commented on issue #19649:
URL: https://github.com/apache/datafusion/issues/19649#issuecomment-3731953824

   Thanks @2010YOUY01 @alamb  @adriangb  @Rachelint  for the feedbacks!
   
   Well given the complexity in the current `GroupedHashAggregateStream`, I 
think it might make sense to have another Stream which have a blocked approach. 
While I was diving deep into the code, I realised it was becoming trickier to 
handle the edge cases with respect to spill, different accumulators along with 
the exist approach. 
   
   I'm a bit concerned about the diverging factor with introducing a code piece 
which won't be the first choice for a while. I think we will need to discuss 
about the proposal a bit more in the sense of how much coverage it should have 
initially and the order of priority (group by, multi group by, accumulators, 
data type, spill support, etc) so that we pick the right order and open 
opportunity for the community to drive it. The real benefits of the feature 
start showing off when we move to limited memory environments with high 
cardinality or accumulators operating on bigger columns(StringView for example) 
which will need significant effort but will make the system much more memory 
efficient. Another angle I have is that we are gonna keep innovating in the 
aggregation space and we need to make it bare metal so that new features also 
start onboarding on it so that we don't diverge too much.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to