Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

via GitHub Mon, 21 Apr 2025 10:24:25 -0700


ozankabak commented on PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2819068976


   The "ideal" flow in DF is to check for ordering during planning and choose 
specialized executors (and accumulators) based on this information. We don't do 
this in all cases yet, but that's how the code has been evolving.
   
   `GroupedHashAggregateStream` stores ordering-related information in the 
`group_ordering` attribute (of type `GroupOrdering`) and does some 
emission-related optimization with that. However, IIRC group accumulators do 
not assume any ordering information (and they are not optimized for cases when 
there is ordering).
   
   If there are certain group accumulators that can benefit from ordering 
information, why don't we define a new class of group accumulators for those 
and instantiate them instead of the more general, non-ordering-assuming ones?
   
   In any case, we really should have a specific example (e.g. a query or a 
plan) as we collaborate to find the right solution to this need. It will help 
us avoid losing time to misunderstandings. Can you share a minimal benefiting 
example? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

Reply via email to