Ruben Quesada Lopez created CALCITE-2967:
--------------------------------------------

             Summary: New rule to remove SemiJoin
                 Key: CALCITE-2967
                 URL: https://issues.apache.org/jira/browse/CALCITE-2967
             Project: Calcite
          Issue Type: New Feature
            Reporter: Ruben Quesada Lopez


(As far as I know, there is no rule to achieve what I am about to describe, if 
there exists already a way to do it, please let me know).

In some specific situations, a SemiJoin can be completely removed and replaced 
by its left child with an appropriate transformed filter.
Let us say we want to retrieve all employees whose department satisfy a certain 
condition, i.e.:
{code}
SELECT * from Employee e
WHERE e.deptno IN
  (SELECT d.deptno FROM Department d
    WHERE <condition>)
{code}

Which would translate to something like:
{code}
SemiJoin (e.deptno=d.deptno)
    Scan (table=Employee as e)
    Filter(<condition>)
        Scan (table=Department as d)
{code}

In a "normal" scenario, e.g. "all employees from Sales department", the plan 
could no be simplified:
{code}
SemiJoin (e.deptno=d.deptno)
    Scan (table=Employee as e)
    Filter(d.name="Sales")
        Scan (table=Department as d)
{code}

But with a specific condition, based on deptno, e.g. "all employees whose 
deptno is greater than 10":
{code}
SemiJoin (e.deptno=d.deptno)
    Scan (table=Employee as e)
    Filter(d.deptno>10)
        Scan (table=Department as d)
{code}
The plan could be simplified: the SemiJoin is not actually needed, we can 
perform the query with a single scan and a converted filter:
{code}
Filter(e.deptno>10)
    Scan (table=Employee as e)
{code}

The goal would be to provide a new rule to achieve that (since there is already 
a SemiJoinRemoveRule, we could name this new rule as e.g. 
SemiJoinSimplifyRule?).
I know that, ideally, this rule should not be needed because the plan could be 
directly written without the SemiJoin, but let's say that we are in a situation 
were the plan is systematically generated with the same pattern, and there is 
no way to know in advanced the filter condition that will be used within.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to