[
https://issues.apache.org/jira/browse/CALCITE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039657#comment-18039657
]
weihua zhang commented on CALCITE-3373:
---------------------------------------
To address this problem, maybe we can adopt an approach similar to Umbra DB
system. The steps would be: first convert IN or existential subquery into
{*}Mark Join{*}—given Mark Join’s strong expressiveness—then optimize them into
SEMI/ANTI joins via rules, and finally hand them over to RelDecorrelator for
processing.
I think introducing Mark Join is more elegant than the handling method in
[https://github.com/apache/calcite/pull/4211]. Additionally, Mark Join will
only serve as an internal representation within Calcite: it will either be
simplified into SEMI/ANTI joins or, if simplification is not feasible,
converted into Calcite’s existing literal_agg expression through transformation
rules. This process will be completely transparent to Calcite users.
> Decorrelate boolean context IN or existential subquery directly into
> SEMI/ANTI join
> -----------------------------------------------------------------------------------
>
> Key: CALCITE-3373
> URL: https://issues.apache.org/jira/browse/CALCITE-3373
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Jin Xing
> Assignee: Viggo Chen
> Priority: Major
> Attachments: image-2025-03-02-17-04-54-608.png,
> image-2025-03-10-19-19-59-437.png
>
>
> The current approach to generate semi-join from IN/EXISTS is as below (except
> the green path):
> !image-2025-03-10-19-19-59-437.png!
> A more straightforward approach is to generate the semi-join or anti-join
> right in SubQueryRemoveRule (the green path).
> For example:
> {code:java}
> select * from sales.emp where NOT EXISTS (select * from emp e where
> emp.deptno = e.deptno) {code}
> The above sql can be convert to
> {code:java}
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4],
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[=($7, $9)], joinType=[anti])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(DEPTNO=[$7])
> LogicalFilter(condition=[true])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> For some subqueries that can not be rewrite to semi-join, such as subquery
> connected with OR, we can fallback to the exiting way.
> {code:java}
> select * from t1 where t1.a > 10 or t1.b in (select t2.c from t2) {code}
>
> This one is related to CALCITE-3367
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)