I run some test in Calcite.
SQL:
select d.deptno, e.deptno
from sales.dept d, sales.emp e
where d.deptno = e.deptno
RelNode:
LogicalProject(DEPTNO=[$0], DEPTNO0=[$9])
LogicalFilter(condition=[=($0, $9)])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
After CoreRules.FILTER_INTO_JOIN and
CoreRules.JOIN_PUSH_EXPRESSIONS hep optimize
The RelNode is:
LogicalProject(DEPTNO=[$0], DEPTNO0=[$9])
LogicalJoin(condition=[=($0, $9)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
I agree with James Starr, The FilterJoinRule seems to be valid for this scene.
JingDas.
> 2023年7月28日 上午12:38,P.F. ZHAN <[email protected]> 写道:
>
> Suppose there is a sql like this:
>
> select sum(sal), sum(case when hiredate = '2020-01-01' then sal end)
> from ( select hiredate, sal, job
> from emp ) a
> where 1 = 1 and hiredate = ( select max(hiredate) from emp )
>
> The optimized plan as follows:
>
> LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[SUM($1)])
> LogicalProject(SAL=[$1], $f1=[CASE(=($0, 2020-01-01 00:00:00), $1,
> null:INTEGER)])
> LogicalProject(HIREDATE=[$0], SAL=[$1], JOB=[$2])
> LogicalFilter(condition=[=($0, $3)])
> LogicalJoin(condition=[true], joinType=[left])
> LogicalProject(HIREDATE=[$4], SAL=[$5], JOB=[$2])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
> LogicalProject(HIREDATE=[$4])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>
> Maybe we can push the LogicalFilter to the LogicalJoin, and get the
> LogicalJoin(condition=[=($0,
> $3)], joinType=[inner])
> It seems this is much better.
>
> Can I do this? If so I will create an issue to resolve this problem.