[jira] [Commented] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition

ruanhui (Jira) Tue, 16 Apr 2024 06:28:04 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837713#comment-17837713
 ]


ruanhui commented on CALCITE-6363:
----------------------------------

Thanks for your reply [~julianhyde] 

{quote}I would like to see some test cases for left and right joins. It is 
possible to move conditions across outer joins, in some cases.{quote}

Ok. Let me verify whether it can be applied to outer join.

{quote}I don't believe that all the changes to RexNormalize and RexUtil are 
necessary.{quote}

I think you mean the new static method RexUtil.canonizeNode ? The purpose of 
introducing this method is to reorder expressions. During rewriting, it is 
possible to generate some *different* but *equivalent* expressions. For 
example, we may get $1 > $2 and $2 < $1, $1 = $2 and $2 = $1, AND(condition1, 
condition2) and AND(condition2, condition1). To remove duplicates, I introduced 
this method, which can provide the following function:
a. literal/constant is always in right, such as: 10 > $1 -> $1 < 10
b. input ref with smaller index is in left, such as: $1 = $0 -> $0 = $1

Do you have any suggestion about how to do this better or we already have a 
better API in codebase ?
 
Thanks.

> Introduce a rule to derive more filters from inner join condition
> -----------------------------------------------------------------
>
>                 Key: CALCITE-6363
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6363
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: ruanhui
>            Priority: Minor
>              Labels: pull-request-available
>
> Sometimes we can infer more predicates from inner Join , for example, in the 
> query
> SELECT * FROM ta INNER JOIN tb ON ta.x = tb.y WHERE ta.x > 10
> we can infer condition tb.y > 10 and we can push it down to the table tb.
> In this way, it is possible to reduce the amount of data involved in the Join.
> To achieve this, here is my idea.
> The core data strucature is two Multimap:
> predicateMap : a map for inputRef to corresponding predicate such as: $1 -> 
> [$1 > 10, $1 < 20, $1 = $2]
> equivalenceMap : a map for inputRef to corresponding equivalent values or 
> inputRefs such as: $1 -> [$2, 1]
> The filter derivation is divided into 4 steps:
> 1. construct predicate map and equivalence map by traversing all conjunctions 
> in the condition
> 2. search map and rewrite predicates with equivalent inputRefs or literals
> 2.1 find all inputRefs that are equivalent to the current inputRef, and then 
> rewrite all predicates involving equivalent inputRefs using inputRef, for 
> example if we have inputRef $1 = equivInputRef $2, then we can rewrite \{$2 = 
> 10} to \{$1 = 10}.
> 2.2 find all predicates involving current inputRef. If any predicate refers 
> to another inputRef, rewrite the predicate with the literal/constant 
> equivalent to that inputRef, such as: if we have inputRef \{$1 > $2} and \{$2 
> = 10} then we can infer new condition \{$1 > 10}.
> 2.3 derive new predicates based on equivalence relation in equivalenceMultimap
> 3. compose all original predicates and derived predicates
> 4. simplify expression such as range merging, like \{$1 > 10 AND $1 > 20} => 
> \{$1 > 20}, \{$1 > $2 AND $1 > $2} => \{$1 > $2}
> Anyone interested in this, please feel free to comment on this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6363) Introduce a rule to derive more filters from inner join condition

Reply via email to