ozankabak commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2819068976
The "ideal" flow in DF is to check for ordering during planning and choose specialized executors (and accumulators) based on this information. We don't do this in all cases yet, but that's how the code has been evolving. `GroupedHashAggregateStream` stores ordering-related information in the `group_ordering` attribute (of type `GroupOrdering`) and does some emission-related optimization with that. However, IIRC group accumulators do not assume any ordering information (and they are not optimized for cases when there is ordering). If there are certain group accumulators that can benefit from ordering information, why don't we define a new class of group accumulators for those and instantiate them instead of the more general, non-ordering-assuming ones? In any case, we really should have a specific example (e.g. a query or a plan) as we collaborate to find the right solution to this need. It will help us avoid losing time to misunderstandings. Can you share a minimal benefiting example? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org