[
https://issues.apache.org/jira/browse/CALCITE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806704#comment-16806704
]
Ruben Quesada Lopez commented on CALCITE-2967:
----------------------------------------------
[~vladimirsitnikov], theoretically yes, but the problem is that, if it is not a
SemiJoin, then the plan is expected to return also the Department fields, so I
guess in that case we could not remove neither the Department scan nor the Join
operation.
The idea to focus this rule on SemiJoins is that we know for sure that only
Employee fields will be output, so it might be safe to remove the Department
scan (given the appropriate circumstances). I'm not sure if we could generalize
this rule beyond SemiJoin...
> New rule to remove SemiJoin
> ---------------------------
>
> Key: CALCITE-2967
> URL: https://issues.apache.org/jira/browse/CALCITE-2967
> Project: Calcite
> Issue Type: New Feature
> Reporter: Ruben Quesada Lopez
> Assignee: Ruben Quesada Lopez
> Priority: Major
>
> (As far as I know, there is no rule to achieve what I am about to describe,
> if there exists already a way to do it, please let me know).
> In some specific situations, a SemiJoin can be completely removed and
> replaced by its left child with an appropriate transformed filter.
> Let us say we want to retrieve all employees whose department satisfy a
> certain condition, i.e.:
> {code}
> SELECT * from Employee e
> WHERE e.deptno IN
> (SELECT d.deptno FROM Department d
> WHERE <condition>)
> {code}
> Which would translate to something like:
> {code}
> SemiJoin (e.deptno=d.deptno)
> Scan (table=Employee as e)
> Filter(<condition>)
> Scan (table=Department as d)
> {code}
> In a "normal" scenario, e.g. "all employees from Sales department", the plan
> could no be simplified:
> {code}
> SemiJoin (e.deptno=d.deptno)
> Scan (table=Employee as e)
> Filter(d.name="Sales")
> Scan (table=Department as d)
> {code}
> But with a specific condition, based on deptno, e.g. "all employees whose
> deptno is greater than 10":
> {code}
> SemiJoin (e.deptno=d.deptno)
> Scan (table=Employee as e)
> Filter(d.deptno>10)
> Scan (table=Department as d)
> {code}
> The plan could be simplified: the SemiJoin is not actually needed, we can
> perform the query with a single scan and a converted filter:
> {code}
> Filter(e.deptno>10)
> Scan (table=Employee as e)
> {code}
> The goal would be to provide a new rule to achieve that (since there is
> already a SemiJoinRemoveRule, we could name this new rule as e.g.
> SemiJoinSimplifyRule?).
> I know that, ideally, this rule should not be needed because the plan could
> be directly written without the SemiJoin, but let's say that we are in a
> situation were the plan is systematically generated with the same pattern,
> and there is no way to know in advanced the filter condition that will be
> used within.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)