2010YOUY01 commented on PR #23181:
URL: https://github.com/apache/datafusion/pull/23181#issuecomment-4852351969

   Thank you for the review, @alamb. The suggestions make sense to me, and I’ll 
try to address them later (likely tomorrow).
   
   > Longer term it would be nice to have some way to reduce the duplication (I 
realize some in inevitable, but there is a lot I think that is unecessary)
   
   I’m also quite tempted to reuse code here. For example, the partial-reduce 
variant could probably be implemented by adding a few flags to the existing 
partial aggregation path.
   
   - https://github.com/apache/datafusion/pull/23233
   
   My main concern is that these paths may evolve into very different shapes in 
the future, due to optimizations specialized for each case. I want to keep that 
future evolution easy.
   
   One example is the ordered aggregation variant. For the fully ordered case, 
I realized we may not need a hash table at all. We could likely make it faster 
by finding the group split points first and then aggregating each group 
directly.
   
   So as a middle ground, I think we should:
   
   * Keep the execution paths (streams) separate when their semantics are 
different.
   * Extract common utility functions for code that is almost identical. If the 
paths need to evolve differently later, we can then stop sharing those 
utilities.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to