Re: Why are window functions considered agg function?

Julian Hyde Fri, 12 Jan 2024 12:11:13 -0800

Simple answer is that an aggregate function is one that takes a set of values. 
For example, if you apply the SUM function to values [1, 2, 3] it returns 6.


When I execute a GROUP BY statement, the values of a particular column within a 
particular group (e.g. the group of rows for which deptno = 10) forms a set. 
And therefore I can apply an aggregate function to it.

When I use an OVER clause, to ‘OVER (PARTITION BY deptno ORDER BY hiredate ROWS 
3 PRECEDING)’ - the preceding 3 hires in the same department in as this 
employees - that also creates a set of values that I can apply the SUM function 
to.

In the two cases, how the sets are formed is different, but the SUM function 
does the same thing.

> On Jan 10, 2024, at 5:30 PM, Will Noble <[email protected]> wrote:
> 
> Ok, I perhaps should have looked a little deeper. Looks like
> org.apache.calcite.rel.core.Window is the "hypothetical" windowing layer I
> described in assumption #4.
> 
> I suppose it makes sense to call window functions "window aggregates" on a
> conceptual level. It's just a little confusing that both AggregateCall
> <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java>
> and Window.RexWinAggCall
> <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Window.java>
> define
> their functions using SqlAggFunction. It feels to me like that's conflating
> two different concepts, but perhaps that's not very important.
> 
> Thanks
> 
> On Wed, Jan 10, 2024 at 5:16 PM Mihai Budiu <[email protected]> wrote:
> 
>> I am not 100% sure I understand your question, but we do implement window
>> functions in our compiler using the Calcite IR.
>> 
>> In our optimizer we use a Calcite rule which rewrites RexOver expressions
>> into LogicalWindow operations:
>> CoreRules.PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW
>> 
>> I find that the meaning of LogicalWindow as an IR representation is quite
>> clean. Each group in a window has a list of aggregate calls, which work
>> just like aggregate calls in a standard group-by setting. Our code handles
>> both kinds of aggregates in the same way.
>> 
>> Mihai
>> 
>> ________________________________
>> From: Will Noble <[email protected]>
>> Sent: Wednesday, January 10, 2024 4:13 PM
>> To: [email protected] <[email protected]>
>> Subject: Why are window functions considered agg function?
>> 
>> I have a question about the way Calcite handles window functions.
>> 
>> Here are my assumptions:
>> 
>>   - The purpose of aggregation is to merge rows of the input relation.
>>   Therefore, an Aggregate
>>   <
>> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/core/Aggregate.java
>>> 
>>   layer can only exist in a relational expression if there is a
>> corresponding GROUP
>>   BY clause in the corresponding SQL expression. If there is no explicit
>> GROUP
>>   BY, then GROUP BY () is assumed implicitly, but logically there is a
>>   1-to-1 correspondence between agg layers and (possibly implicit) GROUP
>> BY
>>   clauses.
>>   - An agg function
>>   <
>> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java
>>> 
>>   is only meant to appear in an aggregate layer; i.e. It should never
>> appear
>>   in a projection layer as a rex function would.
>>   - Calcite generally treats a window function as an agg function with
>>   requiresOver=true. That's the purpose of the requiresOver field, which
>>   has existed since at least as early as 2015.
>>   - Window functions cannot be used with GROUP BY clauses. Invoking them
>>   never causes the rows of the input relation to merge as is
>> characteristic
>>   of an aggregation. Therefore, they are not agg functions. It would
>> probably
>>   make more sense for us to think of them as rex functions that should
>> appear
>>   in a projection, even though they cannot be computed row-wise
>>   independently. Perhaps a new type of relational operator is needed
>> besides
>>   aggregate layers and projection layers; a hypothetical "windowing
>> layer".
>> 
>> Which of my assumptions is wrong?
>> 
>> Consider this BigQuery example using SUM as an agg function. This query
>> would be invalid without the GROUP BY clause. It will return as many rows
>> as there are unique names, and total_score is per-name.
>> 
>> SELECT name, SUM(score) as total_score
>> GROUP BY name
>> FROM games
>> 
>> Now consider this BQ example with SUM as a window function. It would be
>> invalid to include any GROUP BY clause here. It will return as many rows as
>> there are in the input table, and total_score is global (it will have the
>> same value in every single row).
>> 
>> SELECT name, SUM(score) OVER () as total_score
>> FROM games
>> 
>> Does anybody actually use window functions in Calcite as they're currently
>> implemented? How can it possibly make sense to consider them as agg
>> functions, when they can never be used in the same context as a "true" agg
>> function (which requires grouping)? Seems to me like these are actually two
>> completely different functions with zero overlap in terms of where they can
>> appear in a relational expression; they just happen to share the name SUM
>> and involve similar math.
>> 
>> Thanks for any clarification / guidance.
>>

Re: Why are window functions considered agg function?

Reply via email to