[
https://issues.apache.org/jira/browse/CALCITE-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759156#comment-17759156
]
Julian Hyde commented on CALCITE-5953:
--------------------------------------
[~libenchao],
{quote}So returning NULL could give us more information than 0.{quote}
I agree. SUM returns more information than SUM0. As implementors, we have to
retain that extra information, and that makes our implementation more complex.
If the user doesn't want to distinguish "a total that is 0 because there were
no rows" from "a total that is zero because the rows (if any) summed to zero"
then we would like to give them an efficient implementation but we can't
because they wrote SUM.
> AggregateCaseToFilterRule may make inaccurate SUM transformation
> ----------------------------------------------------------------
>
> Key: CALCITE-5953
> URL: https://issues.apache.org/jira/browse/CALCITE-5953
> Project: Calcite
> Issue Type: Bug
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
>
> consider: {{sum(case when x = 1 then 2 else 0 end) as b}}
> notice that this expression may only be null if there are no rows in the table
> {{AggregateCaseToFilterRule}} rewrites the above expression to {{sum(1)
> filter (where x=2)}} which broadens when it could be `null` to when there are
> no matches to the filter
> * *A* is *0* correctly in this case; but I think it will be still *0* in case
> there are 0 input rows
> * The result for *B* supposed to be *0* but since there are no matches by the
> filter it becomes *null*
> * *C* is not touched
> {code}
> # Convert CASE to FILTER without matches
> select sum(case when x = 1 then 1 else 0 end) as a,
> sum(case when x = 1 then 2 else 0 end) as b,
> sum(case when x = 1 then 3 else -1 end) as c
> from (values 0, null, 0, 2) as t(x);
> +---+---+----+
> | A | B | C |
> +---+---+----+
> | 0 | 0 | -4 |
> +---+---+----+
> (1 row)
> !ok
> EnumerableCalc(expr#0..2=[{inputs}], expr#3=[CAST($t0):INTEGER], A=[$t3],
> B=[$t1], C=[$t2])
> EnumerableAggregate(group=[{}], A=[COUNT() FILTER $1], B=[SUM($2) FILTER
> $3], C=[SUM($0)])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[1], expr#2=[=($t0, $t1)],
> expr#3=[3], expr#4=[-1], expr#5=[CASE($t2, $t3, $t4)], expr#6=[IS TRUE($t2)],
> expr#7=[2], $f2=[$t5], $f3=[$t6], $f4=[$t7], $f5=[$t6])
> EnumerableValues(tuples=[[{ 0 }, { null }, { 0 }, { 2 }]])
> !plan
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)