[ 
https://issues.apache.org/jira/browse/KYLIN-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865934#comment-17865934
 ] 

pengfei.zhan commented on KYLIN-5876:
-------------------------------------

h1. Root Cause
The Calcite query plan after SQL is converted to RelRoot is as follows, 
focusing on the CAST($0):DOUBLE part, where the stack overflow problem occurs 
in the CBO phase

{code:java}
LogicalSort(fetch=[500])
  LogicalProject(TEMP(Test)(3502400386)(0)=[CASE(=($1, 0), null, 
/(CAST($0):DOUBLE, $1))])
    LogicalJoin(condition=[true], joinType=[inner])
      LogicalProject(__measure__0=[$0])
        LogicalFilter(condition=[>($1, 0)])
          LogicalAggregate(group=[{}], __measure__0=[SUM($0)], agg#1=[COUNT()])
            LogicalProject(NUM1=[$2], $f1=[1])
              OlapTableScan(table=[[TDVT, CALCS]], ctx=[], fields=[[0, 1, 2, 3, 
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
25, 26]])
      LogicalProject(__measure__2=[$0])
        LogicalFilter(condition=[>($1, 0)])
          LogicalAggregate(group=[{}], __measure__2=[AVG($0)], agg#1=[COUNT()])
            LogicalProject(__measure__0=[$0], $f1=[1])
              LogicalProject(__measure__0=[$1])
                LogicalAggregate(group=[{0}], __measure__0=[SUM($1)])
                  LogicalProject(STR1=[$7], NUM1=[$2])
                    OlapTableScan(table=[[TDVT, CALCS]], ctx=[], fields=[[0, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 
24, 25, 26]])
{code}

CBO's optimization process is unstable, repeatedly matching the optimization 
rules in order to select the least costly implementation, the problem here is 
related to the execution order of the optimization rules (KE code has done a 
version of renaming the optimization rules to move the classes, etc.), when the 
OlapProjectMergeRule is matched after the When OlapProjectMergeRule matches 
after ProjectReduceExpressionsRule, and the optimizer can't be sure that the 
converted optimization rule must be better than the pre-conversion logic plan, 
it keeps using CAST(xxx):DOUBLE to convert the previous ProjectRel, and 
CAST($0):DOUBLE is indefinitely nested into something like 
CAST(CAST(CAST($0):DOUBLE):DOUBLE):DOUBLE, which is the direct cause of the 
stack overflow.
I briefly tried both fixes to get past it, and the exact implementation needs 
to be discussed as to which one is better before fixing it:
In the implementation of kylin, the OlapProjectMergeRule conversion logic plan 
after setting the weights, always think that the optimization rules after the 
conversion is better than before the conversion (simple)
Reference DRILL: https://issues.apache.org/jira/browse/DRILL-6212 fix, took 
simplifyCast's logic and tried it ( I'm guessing that it skips too many nested 
calls and thus consumes less cost, so there is no infinite nesting. The 
calculation of cost is multifaceted, but at the moment the KE side only has 
simple information such as IO, memory, etc. as calculations.), but it needs to 
be adjusted to fit the KE to be compatible with the previous implementation 
(complex).


> Avoid infinite match in OlapProjectMergeRule caused by simplify projects
> ------------------------------------------------------------------------
>
>                 Key: KYLIN-5876
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5876
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: 5.0-alpha
>            Reporter: pengfei.zhan
>            Assignee: pengfei.zhan
>            Priority: Major
>             Fix For: 5.0.0
>
>
> tdvt test compared to version kylin-5.0-alpha a new failed sql has been 
> added, the query reports an error
> {code:sql}
> SELECT (CASE WHEN "t2"."__measure__2" = 0 THEN NULL ELSE 
> CAST("t0"."__measure__0" AS DOUBLE PRECISION) / "t2"."__measure__2" END) AS 
> "TEMP(Test)(3502400386)(0)"
> FROM (
>   SELECT SUM("CALCS"."NUM1") AS "__measure__0"
>   FROM "TDVT"."CALCS" "CALCS"
>   HAVING (COUNT(1) > 0)
> ) "t0"
>   CROSS JOIN (
>   SELECT AVG("t1"."__measure__0") AS "__measure__2"
>   FROM (
>     SELECT SUM("CALCS"."NUM1") AS "__measure__0"
>     FROM "TDVT"."CALCS" "CALCS"
>     GROUP BY "CALCS"."STR1"
>   ) "t1"
>   HAVING (COUNT(1) > 0)
> ) "t2"
> LIMIT 500
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to