Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

via GitHub Mon, 28 Apr 2025 08:41:43 -0700


alamb commented on PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2835680575


   > That is not always the case, some users like Comet for example build 
PhysicalPlan directly and execute that and does not use the optimizer at all.
   
   I wonder if we can take a step back and perhaps describe more precisely what 
we are trying to accomplish
   
   Specifically, is the goal to improve performance after group spills? 
   
   
   If so, perhaps we could explore updating the `group_ordering` and 
`group_values`:
   
   
https://github.com/apache/datafusion/blob/9d2f04996604e709ee440b65f41e7b882f50b788/datafusion/physical-plan/src/aggregates/row_hash.rs#L417-L416
   
   It seems like the group values are instantiated only once initially:
   
https://github.com/apache/datafusion/blob/9d2f04996604e709ee440b65f41e7b882f50b788/datafusion/physical-plan/src/aggregates/row_hash.rs#L546-L545
   
   Thus if the original input is not  sorted by group expressions, when merging 
the group operator will not use the more memory efficient version 🤔 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

Reply via email to