[
https://issues.apache.org/jira/browse/CALCITE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043175#comment-18043175
]
Zhen Chen commented on CALCITE-7317:
------------------------------------
At first glance, this seems relevant to CALCITE-5132.
> SubQueryRemoveRule should skip NULL-safety checks for IN subqueries when both
> the keys and the subquery columns are NOT NULL
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7317
> URL: https://issues.apache.org/jira/browse/CALCITE-7317
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.41.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.42.0
>
>
> SubQueryRemoveRule.rewriteIn() currently generates COUNT aggregates for
> NULL-safety checks in all IN subqueries, even when both the keys (left side)
> and the subquery columns are NOT NULL.
> When both are NOT NULL, the evaluation reduces to TRUE/FALSE, so no extra
> logic is needed to account for NULL values.
> Current behavior:
> {noformat}
> SELECT e.deptno, e.deptno IN (SELECT deptno FROM emp)
> FROM emp AS e{noformat}
> Generates plan with expensive NULL-safety logic:
> {noformat}
> SELECT e.deptno,
> CASE
> WHEN ct.c = 0 THEN false
> WHEN e.deptno IS NULL THEN null
> WHEN dt.i IS NOT NULL THEN true
> WHEN ct.ck < ct.c THEN null
> ELSE false
> END
> FROM emp AS e
> LEFT JOIN (
> (SELECT COUNT(*) AS c, COUNT(deptno) AS ck FROM emp) AS ct
> CROSS JOIN (SELECT DISTINCT deptno, true AS i FROM emp)) AS dt
> ON e.deptno = dt.deptno{noformat}
> When the key (e.deptno) and the subquery column (deptno) are NOT NULL, the
> expected result is to skip the COUNT aggregates and use simpler anti-join, as
> null safety is not needed:
> {noformat}
> SELECT e.deptno,
> CASE
> WHEN dt.i IS NOT NULL THEN true
> ELSE false
> END
> FROM emp AS e
> LEFT JOIN (SELECT DISTINCT deptno, true AS i FROM emp) AS dt
> ON e.deptno = dt.deptno {noformat}
> This has a significant performance impact, as it eliminates a CROSS JOIN with
> aggregate computation, significantly reducing query cost.
> Existing comments already hint into this optimization
> ([SubQueryRemoveRule.java:598-607|https://github.com/apache/calcite/blob/f2df771d86f52e15fba8f626eaeb8b0b9b944823/core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java#L598-L607]):
> {noformat}
> // If keys are not null we can remove "ct" and simplify to
> //
> // select e.deptno,
> // case
> // when dt.i is not null then true
> // else false
> // end
> // from emp as e
> // left join (select distinct deptno, true as i from emp) as dt
> // on e.deptno = dt.deptno{noformat}
> Proposed Solution:
> Conditionally generate the COUNT aggregates only when we detect we need null
> safety (either the keys or the subquery columns are nullable).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)