alamb commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2835680575
> That is not always the case, some users like Comet for example build PhysicalPlan directly and execute that and does not use the optimizer at all. I wonder if we can take a step back and perhaps describe more precisely what we are trying to accomplish Specifically, is the goal to improve performance after group spills? If so, perhaps we could explore updating the `group_ordering` and `group_values`: https://github.com/apache/datafusion/blob/9d2f04996604e709ee440b65f41e7b882f50b788/datafusion/physical-plan/src/aggregates/row_hash.rs#L417-L416 It seems like the group values are instantiated only once initially: https://github.com/apache/datafusion/blob/9d2f04996604e709ee440b65f41e7b882f50b788/datafusion/physical-plan/src/aggregates/row_hash.rs#L546-L545 Thus if the original input is not sorted by group expressions, when merging the group operator will not use the more memory efficient version 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org