I think the design is to have a separate rule to do that optimization,
specifically FilterJoinRule.  Your desired outcome for your current
example is easy to achieve with composition of existing rules.  There is a
considerable amount of code to handle all the cases in FilterJoinRule and
it would be problematic to maintain duplicate logic.

James

On Thu, Jul 27, 2023 at 9:38 AM P.F. ZHAN <[email protected]> wrote:

> Suppose there is a sql like this:
>
> select sum(sal), sum(case when hiredate = '2020-01-01' then sal end)
> from ( select hiredate, sal, job
>     from emp ) a
> where 1 = 1 and hiredate = ( select max(hiredate) from emp )
>
> The optimized plan as follows:
>
> LogicalAggregate(group=[{}], EXPR$0=[SUM($0)], EXPR$1=[SUM($1)])
>   LogicalProject(SAL=[$1], $f1=[CASE(=($0, 2020-01-01 00:00:00), $1,
> null:INTEGER)])
>     LogicalProject(HIREDATE=[$0], SAL=[$1], JOB=[$2])
>       LogicalFilter(condition=[=($0, $3)])
>         LogicalJoin(condition=[true], joinType=[left])
>           LogicalProject(HIREDATE=[$4], SAL=[$5], JOB=[$2])
>             LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>           LogicalAggregate(group=[{}], EXPR$0=[MAX($0)])
>             LogicalProject(HIREDATE=[$4])
>               LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>
> Maybe we can push the LogicalFilter to the LogicalJoin, and get the
> LogicalJoin(condition=[=($0,
> $3)], joinType=[inner])
> It seems this is much better.
>
> Can I do this? If so I will create an issue to resolve this problem.
>

Reply via email to