kumarUjjawal commented on issue #22505:
URL: https://github.com/apache/datafusion/issues/22505#issuecomment-4540811980
Hey @qianzii2,
The standard SQL already has a way to express exactly what you want, using
window functions applied to aggregate results. You can nest an aggregate inside
a window function:
```
SELECT a, b,
SUM(c) AS sum_ab,
SUM(SUM(c)) OVER (PARTITION BY b) AS sum_b_only
FROM t
GROUP BY a, b;
```
`SUM(SUM(c)) OVER (PARTITION BY b)` means: first compute `SUM(c) per (a, b)`
group (the inner aggregate), then sum those results across all rows sharing the
same b (the window function). This gives you the same result as your
`EXBY(SUM(c)`, a) — a single-scan plan with no self-join:
`DataSourceExec → AggregateExec (GROUP BY a, b) → WindowAggExec
(PARTITION BY b)`
This pattern works cleanly for SUM, COUNT, MIN, and MAX. For AVG it's a bit
more involved since avg-of-avgs isn't the same as avg-of-originals when group
sizes differ, you'd need to decompose it:
`SUM(SUM(c)) OVER (PARTITION BY b) / SUM(COUNT(c)) OVER (PARTITION BY b)`
So you don't need a new EXBY operator. Hope this unblocks you! Let me know
if this is clear or I can provide more details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]