amaliujia commented on a change in pull request #1995:
URL: https://github.com/apache/calcite/pull/1995#discussion_r433631323



##########
File path: core/src/test/resources/org/apache/calcite/test/TopDownOptTest.xml
##########
@@ -648,4 +648,80 @@ EnumerableMergeJoin(condition=[AND(=($1, $5), =($0, $4))], 
joinType=[inner])
 ]]>
     </Resource>
   </TestCase>
+    <TestCase name="testHashJoinInnerJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], 
SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[inner])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinFullOuterJoinNotPushDownSort">
+        <Resource name="sql">
+            <![CDATA[
+"select * from
+sales.emp r full outer join sales.bonus s on r.ename=s.ename and r.job=s.job
+order by r.job desc nulls last, r.ename nulls first
+]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], 
SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
+  EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[full])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+    </TestCase>
+    <TestCase name="testHashJoinLeftOuterJoinPushDownSort">
+        <Resource name="sql">
+            <![CDATA[select * from
+        sales.emp r left outer join sales.bonus s on r.ename=s.ename and 
r.job=s.job
+        order by r.job desc nulls last, r.ename nulls first]]>
+        </Resource>
+        <Resource name="planBefore">
+            <![CDATA[
+LogicalSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
+  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], 
SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], ENAME0=[$9], JOB0=[$10], 
SAL0=[$11], COMM0=[$12])
+    LogicalJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, BONUS]])
+]]>
+        </Resource>
+        <Resource name="planAfter">
+            <![CDATA[
+EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
+  EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
+    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
+    EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])

Review comment:
       I am testing trait pass through so adding aggregation to child does not 
solve the problem.
   
   I tried to dig into the cost computation processing. It turns out that for 
this plan:
   
   ```
   EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
     EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
       EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
       EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
   ```
   
   The top Sort's row count is just 2.1, which was is very surprising cause my 
debug point at the `computeSelfCost` in `EnumerableHashJoin` hasn't caught 2.1 
row count, but ~50 row counts.
   
   As a comparison, the plan I am expecting:
   ```
   EnumerableHashJoin(condition=[AND(=($1, $9), =($2, $10))], joinType=[left])
     EnumerableSort(sort0=[$2], sort1=[$1], dir0=[DESC-nulls-last], 
dir1=[ASC-nulls-first])
       EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
       EnumerableTableScan(table=[[CATALOG, SALES, BONUS]])
   ```
   The Sort gets 14 row counts, which is from 
`EnumerableTableScan(table=[[CATALOG, SALES, EMP]])`, which is correct, and 
that's why this Sort's sort is update to `1346.0848941260901`, which is 
understandable.
   
   
   So either I didn't fully understand how Calcite propagate row counts, or 
something wrong for top-down opt code (e.g. all debug point's stops didn't give 
me a 2.1 row count at EnumerableHashJoin while the EnumerableSort on top of 
HashJoin gets that 2.1 row count).
   
   Will continue investigating the root cause.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to