[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922727#comment-16922727
 ] 

Xiening Dai commented on CALCITE-2166:
--------------------------------------

hi [~vvysotskyi] I send out a PR using #1 proposal. Please help take a look. 
Note that the PR wouldn't completely eliminate this error. There are also other 
cases where error is due to the slate cache item (RelSubSet row count is not 
updated properly). Those cases should be addressed by CALCITE-2018. Let's hope 
we can move it forward. Thanks.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2166
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2166
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.15.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Danny Chan
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>           if (subset.best != null) {
>             RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
>             if (!subset.bestCost.equals(bestCost)) {
>               throw new AssertionError(
>                 "relSubset [" + subset.getDescription()
>                   + "] has wrong best cost "
>                   + subset.bestCost + ". Correct cost is " + bestCost);
>             }
>           }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
>     EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>       EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
>     EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = 
> {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
>     EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], 
> deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = 
> {15.0 rows, 60.0 cpu, 0.0 io}, id = 4206
>       EnumerableJoin(subset=[rel#4204:Subset#52.ENUMERABLE.[]], 
> condition=[=($1, $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = 
> {116.0 rows, 0.0 cpu, 0.0 io}, id = 4795
>         EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
>         EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], 
> table=[[hr, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 
> cpu, 0.0 io}, id = 62
> {noformat}
> Its cumulative cost is {{\{233.0 rows, 148.0 cpu, 0.0 io\}}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to