Julian Hyde commented on CALCITE-1495:

We already have SemiJoinRule.

I'm a bit skeptical about this one. For a few reasons:
* To apply the rule, you need to know that none of the columns on the RHS are 
being used.
* SemiJoin is a second-class citizen. There are many fewer rules for it than 
for Join. There are occasions when we want to treat SemiJoin as a kind of Join 
(e.g. heuristic join re-ordering)

I suppose that if none of the columns on the RHS are used, we could push that 
Project through the join and join the LHS to a 0-column RHS (or a k-column RHS, 
where k is the cardinality of the join key, and decide not to project the join 

We shouldn't look for an Aggregate specifically; we should look for a RHS that 
is unique (perhaps by means of Aggregate, perhaps by other means).

Can you give an example of an optimization that this rule makes possible?

> Add a rule to convert INNER JOIN preceded by GROUP BY to appropriate SEMI-JOIN
> ------------------------------------------------------------------------------
>                 Key: CALCITE-1495
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1495
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Vineet Garg
>            Assignee: Julian Hyde
> For IN and EXISTS subqueries Calcite currently generates plan consisting of 
> GROUP BY on inner table followed by INNER JOIN with outer table.
> e.g.  for following query:
> {noformat} select sal from emp where empno IN (select deptno from dept) 
> {noformat}
> Generated plan is:
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
> SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[=($0, $9)], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0}])
>         LogicalProject(DEPTNO=[$0])
>           LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
> Such cases could be converted using this rule to use SEMI-JOIN to make it 
> more efficient

This message was sent by Atlassian JIRA

Reply via email to