[ 
https://issues.apache.org/jira/browse/CALCITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Chen updated CALCITE-6930:
-------------------------------
    Description: 
JoinConditionOrExpansionRule transforms a join with OR conditions into a UNION 
ALL of multiple joins.

For example, the SQL:
{code:java}
SELECT * FROM emp JOIN dept 
ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
Will be transformed into:
{code:java}
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
UNION ALL 
SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
The original plan:
{code:java}
LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
  LogicalTableScan(table=emp)
    LogicalTableScan(table=dept) {code}
Is transformed to:
{code:java}
LogicalUnion(all=[true])
  LogicalJoin(condition=[=(deptno, deptno)])
   LogicalTableScan(table=emp)
   LogicalTableScan(table=dept)
  LogicalJoin(condition=[=(deptno, mgr)])
  LogicalTableScan(table=emp)
  LogicalTableScan(table=dept) {code}
Benefits:
 # Each individual join can use hash-join algorithm, which is more efficient 
than nested-loop joins required for OR conditions.
 # Each join branch can independently choose its optimal join implementation.
 # The joins can be executed in parallel.
 # Individual joins may be able to use indexes that would not be usable with OR 
conditions.

The rule only fires when:
 # The OR condition contains only simple equi-join conditions (comparing 
columns between left and right inputs).

Limitations and considerations:
 # May not improve performance if individual joins produce large results that 
need to be union-ed.
 # Multiple hash tables may cause memory pressure.
 # The optimizer should consider statistics and cost estimates when applying 
this rule.

 

supplement:
 # The inner join will perform the conversion as described above.
 # The outer join will perform the conversion more complexly.

       left join = inner join + anti join

 
{code:java}
Project[*]
└── Join[OR(t1.id=t2.id, t1.age=t2.age), left]
    ├── TableScan[t1]
    └── TableScan[t2]

to

Project[*]
└── UnionAll
    ├── Join[t1.id=t2.id, inner]                    
    │   ├── TableScan[t1]
    │   └── TableScan[t2]
    ├── Join[t1.age=t2.age AND t1.id≠t2.id, inner]  
    │   ├── TableScan[t1]
    │   └── TableScan[t2]
    └── Project[t1-side cols + NULLs]                
        └── Join[t1.id=t2.id, anti]                 
            ├── Join[t1.age=t2.age, anti]           
            │   ├── TableScan[t1]
            │   └── TableScan[t2]
            └── TableScan[t2] {code}
 

        full join = inner join + anti join + right anti join
{code:java}
Project[*]
└── Join[OR(t1.id=t2.id, t1.age=t2.age), full]
    ├── TableScan[t1]
    └── TableScan[t2]

to

Project[*]
└── UnionAll
    ├── Join[t1.id=t2.id, inner]                     
    │   ├── TableScan[t1]
    │   └── TableScan[t2]
    ├── Join[t1.age=t2.age AND t1.id≠t2.id, inner]  
    │   ├── TableScan[t1]
    │   └── TableScan[t2]
    ├── Project[t1-side cols + NULLs]               
    │   └── Join[t1.id=t2.id, anti]
    │       ├── Join[t1.age=t2.age, anti]            
    │       │   ├── TableScan[t1]
    │       │   └── TableScan[t2]
    │       └── TableScan[t2]
    └── Project[NULLs + t2-side cols]               
        └── Join[t2.id=t1.id, anti]
            ├── Join[t2.age=t1.age, anti]           
            │   ├── TableScan[t2]
            │   └── TableScan[t1]
            └── TableScan[t1] {code}

  was:
JoinConditionOrExpansionRule transforms a join with OR conditions into a UNION 
ALL of multiple joins.

For example, the SQL:
{code:java}
SELECT * FROM emp JOIN dept 
ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
Will be transformed into:
{code:java}
SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
UNION ALL 
SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
The original plan:
{code:java}
LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
  LogicalTableScan(table=emp)
    LogicalTableScan(table=dept) {code}
Is transformed to:
{code:java}
LogicalUnion(all=[true])
  LogicalJoin(condition=[=(deptno, deptno)])
   LogicalTableScan(table=emp)
   LogicalTableScan(table=dept)
  LogicalJoin(condition=[=(deptno, mgr)])
  LogicalTableScan(table=emp)
  LogicalTableScan(table=dept) {code}
Benefits:
 # Each individual join can use hash-join algorithm, which is more efficient 
than nested-loop joins required for OR conditions.
 # Each join branch can independently choose its optimal join implementation.
 # The joins can be executed in parallel.
 # Individual joins may be able to use indexes that would not be usable with OR 
conditions.

The rule only fires when:
 # The OR condition contains only simple equi-join conditions (comparing 
columns between left and right inputs).

Limitations and considerations:
 # May not improve performance if individual joins produce large results that 
need to be union-ed.
 # Multiple hash tables may cause memory pressure.
 # The optimizer should consider statistics and cost estimates when applying 
this rule.

 

supplement:
 # The inner join will perform the conversion as described above.
 # The outer join will perform the conversion more complexly.

       left join:

!image-2025-04-03-13-36-23-352.png|width=442,height=228!

        full join:

!image-2025-04-03-13-37-16-575.png|width=501,height=199!


> Implementing JoinConditionOrExpansionRule
> -----------------------------------------
>
>                 Key: CALCITE-6930
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6930
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.40.0
>
>         Attachments: image-2025-04-03-13-36-23-352.png, 
> image-2025-04-03-13-37-16-575.png
>
>
> JoinConditionOrExpansionRule transforms a join with OR conditions into a 
> UNION ALL of multiple joins.
> For example, the SQL:
> {code:java}
> SELECT * FROM emp JOIN dept 
> ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
> Will be transformed into:
> {code:java}
> SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
> UNION ALL 
> SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
> The original plan:
> {code:java}
> LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
>   LogicalTableScan(table=emp)
>     LogicalTableScan(table=dept) {code}
> Is transformed to:
> {code:java}
> LogicalUnion(all=[true])
>   LogicalJoin(condition=[=(deptno, deptno)])
>    LogicalTableScan(table=emp)
>    LogicalTableScan(table=dept)
>   LogicalJoin(condition=[=(deptno, mgr)])
>   LogicalTableScan(table=emp)
>   LogicalTableScan(table=dept) {code}
> Benefits:
>  # Each individual join can use hash-join algorithm, which is more efficient 
> than nested-loop joins required for OR conditions.
>  # Each join branch can independently choose its optimal join implementation.
>  # The joins can be executed in parallel.
>  # Individual joins may be able to use indexes that would not be usable with 
> OR conditions.
> The rule only fires when:
>  # The OR condition contains only simple equi-join conditions (comparing 
> columns between left and right inputs).
> Limitations and considerations:
>  # May not improve performance if individual joins produce large results that 
> need to be union-ed.
>  # Multiple hash tables may cause memory pressure.
>  # The optimizer should consider statistics and cost estimates when applying 
> this rule.
>  
> supplement:
>  # The inner join will perform the conversion as described above.
>  # The outer join will perform the conversion more complexly.
>        left join = inner join + anti join
>  
> {code:java}
> Project[*]
> └── Join[OR(t1.id=t2.id, t1.age=t2.age), left]
>     ├── TableScan[t1]
>     └── TableScan[t2]
> to
> Project[*]
> └── UnionAll
>     ├── Join[t1.id=t2.id, inner]                    
>     │   ├── TableScan[t1]
>     │   └── TableScan[t2]
>     ├── Join[t1.age=t2.age AND t1.id≠t2.id, inner]  
>     │   ├── TableScan[t1]
>     │   └── TableScan[t2]
>     └── Project[t1-side cols + NULLs]                
>         └── Join[t1.id=t2.id, anti]                 
>             ├── Join[t1.age=t2.age, anti]           
>             │   ├── TableScan[t1]
>             │   └── TableScan[t2]
>             └── TableScan[t2] {code}
>  
>         full join = inner join + anti join + right anti join
> {code:java}
> Project[*]
> └── Join[OR(t1.id=t2.id, t1.age=t2.age), full]
>     ├── TableScan[t1]
>     └── TableScan[t2]
> to
> Project[*]
> └── UnionAll
>     ├── Join[t1.id=t2.id, inner]                     
>     │   ├── TableScan[t1]
>     │   └── TableScan[t2]
>     ├── Join[t1.age=t2.age AND t1.id≠t2.id, inner]  
>     │   ├── TableScan[t1]
>     │   └── TableScan[t2]
>     ├── Project[t1-side cols + NULLs]               
>     │   └── Join[t1.id=t2.id, anti]
>     │       ├── Join[t1.age=t2.age, anti]            
>     │       │   ├── TableScan[t1]
>     │       │   └── TableScan[t2]
>     │       └── TableScan[t2]
>     └── Project[NULLs + t2-side cols]               
>         └── Join[t2.id=t1.id, anti]
>             ├── Join[t2.age=t1.age, anti]           
>             │   ├── TableScan[t2]
>             │   └── TableScan[t1]
>             └── TableScan[t1] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to