[jira] [Commented] (CALCITE-4542) Suboptimal plan is chosen when TopDownRuleDriver is enabled

Konstantin Orlov (Jira) Fri, 13 Aug 2021 09:34:23 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398769#comment-17398769
 ]


Konstantin Orlov commented on CALCITE-4542:
-------------------------------------------

Hi, [~rkondakov]!

I certainly will create the PR, but let's dig to the truth first.

I've done more research on the topic and have found that the root cause, 
probably, the 
org.apache.calcite.plan.volcano.TopDownRuleDriver#getOptimizeInputTask method. 
In case the rel's traits don't satisfy the group's traits, we convert the rel 
first, and then we schedule DeriveTask. The problem here that we cache the 
converted rel in passThroughCache (not a problem actually), but then we replace 
the original rel with converted one, so the task is scheduled for the converted 
rel. And when the task is executed, it simply does nothing, because the 
converted rel is in passThroughCache. 

> Suboptimal plan is chosen when TopDownRuleDriver is enabled 
> ------------------------------------------------------------
>
>                 Key: CALCITE-4542
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4542
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.26.0
>            Reporter: Konstantin Orlov
>            Priority: Major
>         Attachments: dump.txt
>
>
> When TopDownRuleDriver is enabled, suboptimal plan is chosen for query with 
> join.
> We have our own convention and implementation of all necessary relations. A 
> distributed join is considered less expensive by our cost system than a 
> single-distributed join, and the merge join is considered less expensive than 
> nested loop if index over join condition is present. Nevertheless the merge 
> join with a single distribution is chosen by optimizer.
> The query is:
> {code:java}
> select e1."empid", e1."deptno" from "emps" e1 join "emps" e2 on e1."empid" = 
> e2."empid"
> {code}
> Actual plan is:
> {code:java}
> MyProject(subset=[rel#16:RelSubset#2.MY.single.[]], empid=[$0], deptno=[$1]): 
> rowcount = 1500.0, cumulative cost = {1500.0 rows, 3000.0 cpu, 0.0 io}, id = 
> 21
>   MyMergeJoin(subset=[rel#20:RelSubset#1.MY.single.[]], condition=[=($0, 
> $5)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {150.0 rows, 
> 0.0 cpu, 0.0 io}, id = 50
>     MyExchange(subset=[rel#25:RelSubset#0.MY.single.[]], 
> distribution=[single]): rowcount = 100.0, cumulative cost = 
> {9210.340371976183 rows, 100.0 cpu, 0.0 io}, id = 30
>       MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]], table=[[hr, 
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 27
>     MyExchange(subset=[rel#25:RelSubset#0.MY.single.[]], 
> distribution=[single]): rowcount = 100.0, cumulative cost = 
> {9210.340371976183 rows, 100.0 cpu, 0.0 io}, id = 30
>       MyTableScan(subset=[rel#29:RelSubset#0.MY.any.[0]], table=[[hr, 
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 27
> {code}
> Expected plan is:
> {code:java}
> MyProject(subset=[rel#16:RelSubset#2.MY.single.[]], empid=[$0], deptno=[$1]): 
> rowcount = 1500.0, cumulative cost = {1500.0 rows, 3000.0 cpu, 0.0 io}, id = 
> 21
>   MyExchange(subset=[rel#20:RelSubset#1.MY.single.[]], 
> distribution=[single]): rowcount = 1500.0, cumulative cost = 
> {18420.680743952365 rows, 100.0 cpu, 0.0 io}, id = 24
>     MyMergeJoin(subset=[rel#23:RelSubset#1.MY.any.[]], condition=[=($0, $5)], 
> joinType=[inner]): rowcount = 1500.0, cumulative cost = {0.15 rows, 0.0 cpu, 
> 0.0 io}, id = 50
>       MyTableScan(subset=[rel#26:RelSubset#0.MY.hash[0].[0]], table=[[hr, 
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 25
>       MyTableScan(subset=[rel#26:RelSubset#0.MY.hash[0].[0]], table=[[hr, 
> emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 25
> {code}
> Planner [^dump.txt] doesn't contain the join with proper distribution.
> Reproducer could be found 
> [here|https://github.com/korlov42/calcite/tree/derive-not-being-called-repoducer].
>  Please run {{org.apache.calcite.tools.PlannerTest#test}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4542) Suboptimal plan is chosen when TopDownRuleDriver is enabled

Reply via email to