I think the design is to have a separate rule to do that optimization, specifically FilterJoinRule. Your desired outcome for your current example is easy to achieve with composition of existing rules. There is a considerable amount of code to handle all the cases in FilterJoinRule and it would be problematic to maintain duplicate logic.
James On Thu, Jul 27, 2023 at 9:38 AM P.F. ZHAN <[email protected]> wrote: > Suppose there is a sql like this: > > select sum(sal), sum(case when hiredate = '2020-01-01' then sal end) > from ( select hiredate, sal, job > from emp ) a > where 1 = 1 and hiredate = ( select max(hiredate) from emp ) > > The optimized plan as follows: > > LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[SUM($1)]) > LogicalProject(SAL=[$1], $f1=[CASE(=($0, 2020-01-01 00:00:00), $1, > null:INTEGER)]) > LogicalProject(HIREDATE=[$0], SAL=[$1], JOB=[$2]) > LogicalFilter(condition=[=($0, $3)]) > LogicalJoin(condition=[true], joinType=[left]) > LogicalProject(HIREDATE=[$4], SAL=[$5], JOB=[$2]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > LogicalAggregate(group=[{}], EXPR$0=[MAX($0)]) > LogicalProject(HIREDATE=[$4]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > > Maybe we can push the LogicalFilter to the LogicalJoin, and get the > LogicalJoin(condition=[=($0, > $3)], joinType=[inner]) > It seems this is much better. > > Can I do this? If so I will create an issue to resolve this problem. >
