[
https://issues.apache.org/jira/browse/CALCITE-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-6778:
------------------------------------
Labels: pull-request-available (was: )
> Inconsistent null behavior with correlated queries + quantify operators
> -----------------------------------------------------------------------
>
> Key: CALCITE-6778
> URL: https://issues.apache.org/jira/browse/CALCITE-6778
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.38.0
> Reporter: Rafael Acevedo
> Priority: Major
> Labels: pull-request-available
>
> Hi all!
> I believe I've found a bug in the quantify operators. The following queries
> should be equivalent, but yield different results:
> Query 1:
> SQL:
> {code:sql}
> --returns null, as expected
> SELECT TRUE = ALL (
> SELECT b
> FROM (SELECT * FROM (VALUES (TRUE), (NULL)) as x(a)) AS x1(b))
> AS test;
> {code}
> Plan:
> {code}
> EnumerableCalc(expr#0..2=[{inputs}], expr#3=[0], expr#4=[=($t0, $t3)],
> expr#5=[<>($t1, $t0)], expr#6=[1], expr#7=[<=($t1, $t6)],
> expr#8=[null:BOOLEAN], expr#9=[AND($t5, $t7, $t2, $t8)], expr#10=[=($t1,
> $t6)], expr#11=[=($t1, $t0)], expr#12=[>($t1, $t6)], expr#13=[OR($t11,
> $t12)], expr#14=[AND($t10, $t2, $t13)], expr#15=[OR($t4, $t9, $t14)],
> TEST=[$t15])
> EnumerableAggregate(group=[{}], c=[COUNT()], d=[COUNT($0)], m=[MAX($0)])
> EnumerableValues(tuples=[[{ true }, { null }]])
>
> {code}
> Query 2:
> SQL:
> {code:sql}
> --returns true but should be null
> WITH tb as (select array(SELECT * FROM (VALUES (TRUE), (NULL)) as x(a)) as a)
> SELECT TRUE = ALL (
> SELECT b
> FROM UNNEST(a) AS x1(b)
> ) AS test
> FROM tb;
> {code}
> Plan:
> {code}
> EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)], expr#6=[0],
> expr#7=[=($t1, $t6)], expr#8=[OR($t5, $t7)], expr#9=[IS TRUE($t8)],
> expr#10=[<>($t2, $t1)], expr#11=[1], expr#12=[<=($t2, $t11)],
> expr#13=[AND($t10, $t12)], expr#14=[IS TRUE($t13)], expr#15=[null:BOOLEAN],
> expr#16=[IS NOT TRUE($t8)], expr#17=[AND($t14, $t3, $t15, $t16)],
> expr#18=[=($t2, $t11)], expr#19=[IS TRUE($t18)], expr#20=[IS NOT TRUE($t13)],
> expr#21=[AND($t19, $t3, $t16, $t20)], expr#22=[OR($t9, $t17, $t21)],
> TEST=[$t22])
> EnumerableCorrelate(correlation=[$cor0], joinType=[left],
> requiredColumns=[{0}])
> EnumerableCollect(field=[x])
> EnumerableValues(tuples=[[{ true }, { null }]])
> EnumerableCalc(expr#0..2=[{inputs}], expr#3=[true],
> proj#0..3=[{exprs}])
> EnumerableAggregate(group=[{}], c=[COUNT() FILTER $3],
> d=[COUNT($0) FILTER $2], m=[MIN($1) FILTER $3])
> EnumerableCalc(expr#0..2=[{inputs}],
> expr#3=[0], expr#4=[=($t2, $t3)], expr#5=[1], expr#6=[=($t2, $t5)],
> proj#0..1=[{exprs}], $g_0=[$t4], $g_1=[$t6])
> EnumerableAggregate(group=[{0}],
> groups=[[{0}, {}]], m=[MAX($0)], $g=[GROUPING($0)])
> EnumerableUncollect
>
> EnumerableCalc(expr#0=[{inputs}], expr#1=[$cor0], expr#2=[$t1.A], A=[$t2])
>
> EnumerableValues(tuples=[[{ 0 }]])
>
>
> {code}
> Looking at the code, {{SubQueryRemoveRule}} produces different plans because
> query 2 is correlated. [Code
> pointer|https://github.com/apache/calcite/blob/575fc1a583b3f5b5febdee40da4c5cb46c9022b1/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java#L230].
> A possible cause for different values is that for correlated queries (query 2
> in this case) {{SubQueryRemoveRule}}
> [uses|https://github.com/apache/calcite/blob/575fc1a583b3f5b5febdee40da4c5cb46c9022b1/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java#L408-L409]
> {{count(distinct *)}} and {{count(distinct <col>)}}. Non-correlated queries
> (query 1 in this case) use {{count ( * )}} and {{count(<col>)}}, which behave
> differently with null values. I assume there's a reason to use distinct,
> right?
> One possible change is modifying the correlated case to use non-distinct
> counts. I tested this change locally and no test breaks, but I'd like to get
> some better understanding of the reasoning behind using distinct.
> Can anyone help me? maybe [~nobigo]?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)