amaliujia commented on a change in pull request #1995: URL: https://github.com/apache/calcite/pull/1995#discussion_r433631323
########## File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml ########## @@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], joinType=[inner]) ]]> </Resource> </TestCase> + <TestCase name="testHashJoinInnerJoinNotPushDownSort"> + <Resource name="sql"> + <![CDATA[ +"select * from +sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job +order by r.job desc nulls last, r.ename nulls first +]]> + </Resource> + <Resource name="planBefore"> + <![CDATA[ +LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) + LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12]) + LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner]) + LogicalTableScan(table=[[CATALOG, SALES, EMP]]) + LogicalTableScan(table=[[CATALOG, SALES, BONUS]]) +]]> + </Resource> + <Resource name="planAfter"> + <![CDATA[ +EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) + EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner]) + EnumerableTableScan(table=[[CATALOG, SALES, EMP]]) + EnumerableTableScan(table=[[CATALOG, SALES, BONUS]]) +]]> + </Resource> + </TestCase> + <TestCase name="testHashJoinFullOuterJoinNotPushDownSort"> + <Resource name="sql"> + <![CDATA[ +"select * from +sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job +order by r.job desc nulls last, r.ename nulls first +]]> + </Resource> + <Resource name="planBefore"> + <![CDATA[ +LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) + LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12]) + LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full]) + LogicalTableScan(table=[[CATALOG, SALES, EMP]]) + LogicalTableScan(table=[[CATALOG, SALES, BONUS]]) +]]> + </Resource> + <Resource name="planAfter"> + <![CDATA[ +EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) + EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full]) + EnumerableTableScan(table=[[CATALOG, SALES, EMP]]) + EnumerableTableScan(table=[[CATALOG, SALES, BONUS]]) +]]> + </Resource> + </TestCase> + <TestCase name="testHashJoinLeftOuterJoinPushDownSort"> + <Resource name="sql"> + <![CDATA[select * from + sales.emp r left outer join sales.bonus s on r.ename=s.ename and r.job=s.job + order by r.job desc nulls last, r.ename nulls first]]> + </Resource> + <Resource name="planBefore"> + <![CDATA[ +LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) + LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], SAL0=[$11], COMM0=[$12]) + LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left]) + LogicalTableScan(table=[[CATALOG, SALES, EMP]]) + LogicalTableScan(table=[[CATALOG, SALES, BONUS]]) +]]> + </Resource> + <Resource name="planAfter"> + <![CDATA[ +EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left]) + EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) + EnumerableTableScan(table=[[CATALOG, SALES, EMP]]) + EnumerableTableScan(table=[[CATALOG, SALES, BONUS]]) Review comment: I am testing trait pass through so adding aggregation to child does not solve the problem. I tried to dig into the cost computation processing. It turns out that for this plan: ``` EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left]) EnumerableTableScan(table=[[CATALOG, SALES, EMP]]) EnumerableTableScan(table=[[CATALOG, SALES, BONUS]]) ``` The top Sort's row count is just 2.1, which was is very surprising cause my debug point at the `computeSelfCost` in `EnumerableHashJoin` hasn't caught 2.1 row count, but ~50 row counts. As a comparison, the plan I am expecting: ``` EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left]) EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first]) EnumerableTableScan(table=[[CATALOG, SALES, EMP]]) EnumerableTableScan(table=[[CATALOG, SALES, BONUS]]) ``` The Sort gets 14 row counts, which is from `EnumerableTableScan(table=[[CATALOG, SALES, EMP]])`, which is correct, and that's why this Sort's sort is update to `1346.0848941260901`, which is understandable. So either I didn't fully understand how Calcite propagate row counts, or something wrong for top-down opt code (e.g. all debug point's stops didn't give me a 2.1 row count at EnumerableHashJoin while the EnumerableSort on top of HashJoin gets that 2.1 row count). Will continue investigating the root cause. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org