[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

Xiening Dai (Jira) Mon, 16 Sep 2019 03:05:04 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930395#comment-16930395
 ]


Xiening Dai commented on CALCITE-2166:
--------------------------------------

hi [~danny0405], sorry I just saw your comments here in JIRA. The reason I 
chose #2 is because with #1 we would possibly get into the other error "rel x 
has lower cost than the best of subset", which is the problem we tried to fix 
in the first place. #2 pay slightly higher cost but does guarantee we have the 
cheapest rel in subset as the best rel. There are multiple places broken in the 
memo we need to fix them one by one. I agree 2018 is more fundamental, and 
personally I prefer get that fixed first. But the previous consensus was to fix 
2166 first.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2166
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2166
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.15.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Danny Chan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.22.0
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>           if (subset.best != null) {
>             RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
>             if (!subset.bestCost.equals(bestCost)) {
>               throw new AssertionError(
>                 "relSubset [" + subset.getDescription()
>                   + "] has wrong best cost "
>                   + subset.bestCost + ". Correct cost is " + bestCost);
>             }
>           }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
>     EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>       EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
>     EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = 
> {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
>     EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], 
> deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = 
> {15.0 rows, 60.0 cpu, 0.0 io}, id = 4206
>       EnumerableJoin(subset=[rel#4204:Subset#52.ENUMERABLE.[]], 
> condition=[=($1, $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = 
> {116.0 rows, 0.0 cpu, 0.0 io}, id = 4795
>         EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
>         EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], 
> table=[[hr, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 
> cpu, 0.0 io}, id = 62
> {noformat}
> Its cumulative cost is {{\{233.0 rows, 148.0 cpu, 0.0 io\}}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (CALCITE-2166) Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

Reply via email to