[
https://issues.apache.org/jira/browse/CALCITE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502791#comment-17502791
]
Haisheng Yuan commented on CALCITE-4542:
----------------------------------------
Hi [~korlov], thank you for providing the detailed test case.
However, the test case doesn't show there is a bug in the optimizer. Instead, I
found something strange in the test case.
In
https://github.com/korlov42/calcite/commit/f4a5c2f01e0ec67156f7e91c6b5839dca1db6776#diff-853e16de41615bab150cd7ecf77a868e4b4ab0396d5c9f283d4247c222eba814R974,
instead of node.copy(), you should use RelOptRule.convert(). As you can see,
the tablescan (MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]])) in the
final plan is not even registered in the MEMO structure.
> Suboptimal plan is chosen when TopDownRuleDriver is enabled
> ------------------------------------------------------------
>
> Key: CALCITE-4542
> URL: https://issues.apache.org/jira/browse/CALCITE-4542
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.26.0
> Reporter: Konstantin Orlov
> Priority: Major
> Attachments: dump.txt
>
>
> When TopDownRuleDriver is enabled, suboptimal plan is chosen for query with
> join.
> We have our own convention and implementation of all necessary relations. A
> distributed join is considered less expensive by our cost system than a
> single-distributed join, and the merge join is considered less expensive than
> nested loop if index over join condition is present. Nevertheless the merge
> join with a single distribution is chosen by optimizer.
> The query is:
> {code:java}
> select e1."empid", e1."deptno" from "emps" e1 join "emps" e2 on e1."empid" =
> e2."empid"
> {code}
> Actual plan is:
> {code:java}
> MyProject(subset=[rel#16:RelSubset#2.MY.single.[]], empid=[$0], deptno=[$1]):
> rowcount = 1500.0, cumulative cost = {1500.0 rows, 3000.0 cpu, 0.0 io}, id =
> 21
> MyMergeJoin(subset=[rel#20:RelSubset#1.MY.single.[]], condition=[=($0,
> $5)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {150.0 rows,
> 0.0 cpu, 0.0 io}, id = 50
> MyExchange(subset=[rel#25:RelSubset#0.MY.single.[]],
> distribution=[single]): rowcount = 100.0, cumulative cost =
> {9210.340371976183 rows, 100.0 cpu, 0.0 io}, id = 30
> MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]], table=[[hr,
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 27
> MyExchange(subset=[rel#25:RelSubset#0.MY.single.[]],
> distribution=[single]): rowcount = 100.0, cumulative cost =
> {9210.340371976183 rows, 100.0 cpu, 0.0 io}, id = 30
> MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]], table=[[hr,
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 27
> {code}
> Expected plan is:
> {code:java}
> MyProject(subset=[rel#16:RelSubset#2.MY.single.[]], empid=[$0], deptno=[$1]):
> rowcount = 1500.0, cumulative cost = {1500.0 rows, 3000.0 cpu, 0.0 io}, id =
> 21
> MyExchange(subset=[rel#20:RelSubset#1.MY.single.[]],
> distribution=[single]): rowcount = 1500.0, cumulative cost =
> {18420.680743952365 rows, 100.0 cpu, 0.0 io}, id = 24
> MyMergeJoin(subset=[rel#23:RelSubset#1.MY.any.[]], condition=[=($0, $5)],
> joinType=[inner]): rowcount = 1500.0, cumulative cost = {0.15 rows, 0.0 cpu,
> 0.0 io}, id = 50
> MyTableScan(subset=[rel#26:RelSubset#0.MY.hash[0].[0]], table=[[hr,
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 25
> MyTableScan(subset=[rel#26:RelSubset#0.MY.hash[0].[0]], table=[[hr,
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 25
> {code}
> Planner [^dump.txt] doesn't contain the join with proper distribution.
> Reproducer could be found
> [here|https://github.com/korlov42/calcite/tree/derive-not-being-called-repoducer].
> Please run {{org.apache.calcite.tools.PlannerTest#test}}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)