[ 
https://issues.apache.org/jira/browse/CALCITE-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051874#comment-18051874
 ] 

Stamatis Zampetakis commented on CALCITE-7362:
----------------------------------------------

I don't have a strong preference on which representation to pick for the 
conditional aggregate; FILTER(WHERE) vs. CASE WHEN. I used CASE in the examples 
since this pattern appears more frequently when people refer to conditional 
aggregates but don't mind using FILTER(WHERE) in the actual implementation. As 
we already have rules to translate one to the other the actual choice does not 
make much of a difference.

Based on my interpretation of the standard and testing over few DBMS I believe 
the transformation is sound even if the filter removes all rows.

The rows that don't satisfy the condition:
 * in the WHERE clause never reach the aggregate;
 * in the SELECT clause (as conditional) become single-column null values.

The condition inside the aggregate becomes a <value expression> and is subject 
to the following SQL standard rules:
{noformat}
<general set function> ::= <set function type> <left paren> [ <set quantifier> 
] <value expression> <right paren>
...
If <general set function> is specified, then:
a) Let TX be the single-column table that is the result of applying the <value 
expression> to each row of T1 and eliminating null values.
{noformat}
According to the standard rules the single column null values that are 
generated from the conditional CASE statement in the SELECT clause are 
eliminated. The "TX" table does not contain nulls so the result of the 
aggregate shouldn't change no matter where we evaluate the condition. The 
behavior of the aggregate function on empty input is defined *later* where the 
virtual "TX" table is already computed.
{noformat}
d) ii)If TXA is empty, then the result is the null value.
{noformat}
So from SQL standard perspective the transformation is sound and both queries 
should return the same results.

>From a plan (RelNode) perspective the input to the {{Aggregate}} operator is 
>different:
 * empty result set in the case of Scan -> Filter -> Aggregate plan
 * single-column result set with all values null in the case of Scan -> Project 
-> Aggregate plan

but if the aggregate function eliminates null (as per standard) then the result 
should be the same in both cases.

The above analysis assumes that there is no GROUP BY clause so essentially we 
are dealing with grand totals. In the presence of grouping columns and grouping 
sets the transformation does not hold so not planning to tackle these special 
cases here.

> Add rule to transform WHERE clauses into conditional aggregates
> ---------------------------------------------------------------
>
>                 Key: CALCITE-7362
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7362
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> Add a rule to transform aggregate queries with filtering based on WHERE 
> clauses to conditional aggregates (CASE WHEN) without a WHERE clause.
> The proposed transformation using the SQL representation is shown below:
> +Before+
> {code:sql}
> select sum(ss_net_paid_inc_tax)
> from store_sales
> where ss_quantity < 20
> {code}
> +After+
> {code:sql}
> select sum(case when ss_quantity < 20 then ss_net_paid_inc_tax else null)
> from store_sales
> {code}
> The queries are equivalent and the transformation is valid for all aggregate 
> functions that [skip NULL input 
> values|https://github.com/apache/calcite/blob/c0d5a0832808fabfa32dea744415c0f46c516bce/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java#L233].
> The main motivation for introducing this rule is view-based rewriting where 
> it is quite common to have multiple conditional aggregates in the same 
> (materialized) view definition for precomputing and reusing expensive 
> aggregations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to