Simple answer is that an aggregate function is one that takes a set of values. For example, if you apply the SUM function to values [1, 2, 3] it returns 6.
When I execute a GROUP BY statement, the values of a particular column within a particular group (e.g. the group of rows for which deptno = 10) forms a set. And therefore I can apply an aggregate function to it. When I use an OVER clause, to ‘OVER (PARTITION BY deptno ORDER BY hiredate ROWS 3 PRECEDING)’ - the preceding 3 hires in the same department in as this employees - that also creates a set of values that I can apply the SUM function to. In the two cases, how the sets are formed is different, but the SUM function does the same thing. > On Jan 10, 2024, at 5:30 PM, Will Noble <[email protected]> wrote: > > Ok, I perhaps should have looked a little deeper. Looks like > org.apache.calcite.rel.core.Window is the "hypothetical" windowing layer I > described in assumption #4. > > I suppose it makes sense to call window functions "window aggregates" on a > conceptual level. It's just a little confusing that both AggregateCall > <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java> > and Window.RexWinAggCall > <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Window.java> > define > their functions using SqlAggFunction. It feels to me like that's conflating > two different concepts, but perhaps that's not very important. > > Thanks > > On Wed, Jan 10, 2024 at 5:16 PM Mihai Budiu <[email protected]> wrote: > >> I am not 100% sure I understand your question, but we do implement window >> functions in our compiler using the Calcite IR. >> >> In our optimizer we use a Calcite rule which rewrites RexOver expressions >> into LogicalWindow operations: >> CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW >> >> I find that the meaning of LogicalWindow as an IR representation is quite >> clean. Each group in a window has a list of aggregate calls, which work >> just like aggregate calls in a standard group-by setting. Our code handles >> both kinds of aggregates in the same way. >> >> Mihai >> >> ________________________________ >> From: Will Noble <[email protected]> >> Sent: Wednesday, January 10, 2024 4:13 PM >> To: [email protected] <[email protected]> >> Subject: Why are window functions considered agg function? >> >> I have a question about the way Calcite handles window functions. >> >> Here are my assumptions: >> >> - The purpose of aggregation is to merge rows of the input relation. >> Therefore, an Aggregate >> < >> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java >>> >> layer can only exist in a relational expression if there is a >> corresponding GROUP >> BY clause in the corresponding SQL expression. If there is no explicit >> GROUP >> BY, then GROUP BY () is assumed implicitly, but logically there is a >> 1-to-1 correspondence between agg layers and (possibly implicit) GROUP >> BY >> clauses. >> - An agg function >> < >> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java >>> >> is only meant to appear in an aggregate layer; i.e. It should never >> appear >> in a projection layer as a rex function would. >> - Calcite generally treats a window function as an agg function with >> requiresOver=true. That's the purpose of the requiresOver field, which >> has existed since at least as early as 2015. >> - Window functions cannot be used with GROUP BY clauses. Invoking them >> never causes the rows of the input relation to merge as is >> characteristic >> of an aggregation. Therefore, they are not agg functions. It would >> probably >> make more sense for us to think of them as rex functions that should >> appear >> in a projection, even though they cannot be computed row-wise >> independently. Perhaps a new type of relational operator is needed >> besides >> aggregate layers and projection layers; a hypothetical "windowing >> layer". >> >> Which of my assumptions is wrong? >> >> Consider this BigQuery example using SUM as an agg function. This query >> would be invalid without the GROUP BY clause. It will return as many rows >> as there are unique names, and total_score is per-name. >> >> SELECT name, SUM(score) as total_score >> GROUP BY name >> FROM games >> >> Now consider this BQ example with SUM as a window function. It would be >> invalid to include any GROUP BY clause here. It will return as many rows as >> there are in the input table, and total_score is global (it will have the >> same value in every single row). >> >> SELECT name, SUM(score) OVER () as total_score >> FROM games >> >> Does anybody actually use window functions in Calcite as they're currently >> implemented? How can it possibly make sense to consider them as agg >> functions, when they can never be used in the same context as a "true" agg >> function (which requires grouping)? Seems to me like these are actually two >> completely different functions with zero overlap in terms of where they can >> appear in a relational expression; they just happen to share the name SUM >> and involve similar math. >> >> Thanks for any clarification / guidance. >>
