[
https://issues.apache.org/jira/browse/CALCITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhen Chen updated CALCITE-6930:
-------------------------------
Description:
JoinConditionOrExpansionRule transforms a join with OR conditions into a UNION
ALL of multiple joins.
For example, the SQL:
{code:java}
SELECT * FROM emp JOIN dept
ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
Will be transformed into:
{code:java}
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
UNION ALL
SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
The original plan:
{code:java}
LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept) {code}
Is transformed to:
{code:java}
LogicalUnion(all=[true])
LogicalJoin(condition=[=(deptno, deptno)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
LogicalJoin(condition=[=(deptno, mgr)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept) {code}
Benefits:
# Each individual join can use hash-join algorithm, which is more efficient
than nested-loop joins required for OR conditions.
# Each join branch can independently choose its optimal join implementation.
# The joins can be executed in parallel.
# Individual joins may be able to use indexes that would not be usable with OR
conditions.
The rule only fires when:
# The OR condition contains only simple equi-join conditions (comparing
columns between left and right inputs).
Limitations and considerations:
# May not improve performance if individual joins produce large results that
need to be union-ed.
# Multiple hash tables may cause memory pressure.
# The optimizer should consider statistics and cost estimates when applying
this rule.
supplement:
# The inner join will perform the conversion as described above.
# The outer join will perform the conversion more complexly.
left join:
!image-2025-04-03-13-36-23-352.png|width=442,height=228!
was:
JoinConditionOrExpansionRule transforms a join with OR conditions into a UNION
ALL of multiple joins.
For example, the SQL:
{code:java}
SELECT * FROM emp JOIN dept
ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
Will be transformed into:
{code:java}
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
UNION ALL
SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
The original plan:
{code:java}
LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept) {code}
Is transformed to:
{code:java}
LogicalUnion(all=[true])
LogicalJoin(condition=[=(deptno, deptno)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept)
LogicalJoin(condition=[=(deptno, mgr)])
LogicalTableScan(table=emp)
LogicalTableScan(table=dept) {code}
Benefits:
# Each individual join can use hash-join algorithm, which is more efficient
than nested-loop joins required for OR conditions.
# Each join branch can independently choose its optimal join implementation.
# The joins can be executed in parallel.
# Individual joins may be able to use indexes that would not be usable with OR
conditions.
The rule only fires when:
# The OR condition contains only simple equi-join conditions (comparing
columns between left and right inputs).
Limitations and considerations:
# May not improve performance if individual joins produce large results that
need to be union-ed.
# Multiple hash tables may cause memory pressure.
# The optimizer should consider statistics and cost estimates when applying
this rule.
supplement:
# The inner join will perform the conversion as described above.
# The outer join will perform the conversion more complexly.
left join:
{code:java}
// code placeholder
{code}
> Implementing JoinConditionOrExpansionRule
> -----------------------------------------
>
> Key: CALCITE-6930
> URL: https://issues.apache.org/jira/browse/CALCITE-6930
> Project: Calcite
> Issue Type: New Feature
> Reporter: Zhen Chen
> Assignee: Zhen Chen
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2025-04-03-13-36-23-352.png
>
>
> JoinConditionOrExpansionRule transforms a join with OR conditions into a
> UNION ALL of multiple joins.
> For example, the SQL:
> {code:java}
> SELECT * FROM emp JOIN dept
> ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
> Will be transformed into:
> {code:java}
> SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
> UNION ALL
> SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
> The original plan:
> {code:java}
> LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
> LogicalTableScan(table=emp)
> LogicalTableScan(table=dept) {code}
> Is transformed to:
> {code:java}
> LogicalUnion(all=[true])
> LogicalJoin(condition=[=(deptno, deptno)])
> LogicalTableScan(table=emp)
> LogicalTableScan(table=dept)
> LogicalJoin(condition=[=(deptno, mgr)])
> LogicalTableScan(table=emp)
> LogicalTableScan(table=dept) {code}
> Benefits:
> # Each individual join can use hash-join algorithm, which is more efficient
> than nested-loop joins required for OR conditions.
> # Each join branch can independently choose its optimal join implementation.
> # The joins can be executed in parallel.
> # Individual joins may be able to use indexes that would not be usable with
> OR conditions.
> The rule only fires when:
> # The OR condition contains only simple equi-join conditions (comparing
> columns between left and right inputs).
> Limitations and considerations:
> # May not improve performance if individual joins produce large results that
> need to be union-ed.
> # Multiple hash tables may cause memory pressure.
> # The optimizer should consider statistics and cost estimates when applying
> this rule.
>
> supplement:
> # The inner join will perform the conversion as described above.
> # The outer join will perform the conversion more complexly.
> left join:
> !image-2025-04-03-13-36-23-352.png|width=442,height=228!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)