[ 
https://issues.apache.org/jira/browse/CALCITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944494#comment-17944494
 ] 

Mihai Budiu commented on CALCITE-6930:
--------------------------------------

It looks to me like TPC-H Q19 will benefit a lot from this optimization.
Here is the query:
{code:sql}
create view q19 (
    revenue
) as
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#22'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 8 and l_quantity <= 8 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 10 and l_quantity <= 10 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#12'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 24 and l_quantity <= 24 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );
{code}


> Implementing JoinConditionOrExpansionRule
> -----------------------------------------
>
>                 Key: CALCITE-6930
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6930
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2025-04-03-13-36-23-352.png, 
> image-2025-04-03-13-37-16-575.png
>
>
> JoinConditionOrExpansionRule transforms a join with OR conditions into a 
> UNION ALL of multiple joins.
> For example, the SQL:
> {code:java}
> SELECT * FROM emp JOIN dept 
> ON emp.deptno = dept.deptno OR emp.deptno = dept.mgr {code}
> Will be transformed into:
> {code:java}
> SELECT * FROM emp JOIN dept ON emp.deptno = dept.deptno
> UNION ALL 
> SELECT * FROM emp JOIN dept ON emp.deptno = dept.mgr {code}
> The original plan:
> {code:java}
> LogicalJoin(condition=[OR(=(deptno, deptno), =(deptno, mgr))])
>   LogicalTableScan(table=emp)
>     LogicalTableScan(table=dept) {code}
> Is transformed to:
> {code:java}
> LogicalUnion(all=[true])
>   LogicalJoin(condition=[=(deptno, deptno)])
>    LogicalTableScan(table=emp)
>    LogicalTableScan(table=dept)
>   LogicalJoin(condition=[=(deptno, mgr)])
>   LogicalTableScan(table=emp)
>   LogicalTableScan(table=dept) {code}
> Benefits:
>  # Each individual join can use hash-join algorithm, which is more efficient 
> than nested-loop joins required for OR conditions.
>  # Each join branch can independently choose its optimal join implementation.
>  # The joins can be executed in parallel.
>  # Individual joins may be able to use indexes that would not be usable with 
> OR conditions.
> The rule only fires when:
>  # The OR condition contains only simple equi-join conditions (comparing 
> columns between left and right inputs).
> Limitations and considerations:
>  # May not improve performance if individual joins produce large results that 
> need to be union-ed.
>  # Multiple hash tables may cause memory pressure.
>  # The optimizer should consider statistics and cost estimates when applying 
> this rule.
>  
> supplement:
>  # The inner join will perform the conversion as described above.
>  # The outer join will perform the conversion more complexly.
>        left join:
> !image-2025-04-03-13-36-23-352.png|width=442,height=228!
>         full join:
> !image-2025-04-03-13-37-16-575.png|width=501,height=199!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to