[ 
https://issues.apache.org/jira/browse/CALCITE-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde resolved CALCITE-6691.
----------------------------------
    Resolution: Fixed

Fixed in 
[fae49bc|https://github.com/apache/calcite/commit/fae49bc06a9dac705fb266b7883d5f951cb4da57].

> Qualify expressions not updated when merging two Project
> --------------------------------------------------------
>
>                 Key: CALCITE-6691
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6691
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Lino Rosa
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.39.0
>
>
> Given a query like the below
> {code:sql}
> WITH t0 AS (
>   SELECT
>     name,
>     salary
>     FROM mytable
> ),
> t1 AS (
>   SELECT
>     t0.name
>     FROM t0
>     qualify row_number() over (partition by t0.name order by t0.salary desc) 
> = 1
> )
> select name from t1
> {code}
> We end up with the *wrong* query plan below:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
>   LogicalFilter(condition=[$1])
>     LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER 
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
>       LogicalTableScan(table=[[mytable]])
> {noformat}
> Under {{PARTITION BY $0}}, the {{$0}} should be referencing {{t0.name}} which 
> on the select list on the same node is {{name=[$5]}}. So something is off.
> ----
> After some debugging I can trace back the issue.
> Here is the query plan *unsquashed*:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
>   LogicalProject(name=[$0])
>     LogicalFilter(condition=[$1])
>       LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER 
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
>         LogicalProject(name=[$5], salary=[$6])
>           LogicalTableScan(table=[[mytable]])
> {noformat}
> And here it is *squashed*:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
>   LogicalFilter(condition=[$1])
>     LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER 
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
>       LogicalTableScan(table=[[mytable]])
> {noformat}
> This is the relevant section before squashing:
> {noformat}
> (1)  LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER 
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> (2)    LogicalProject(name=[$5], salary=[$6])
> (3)      LogicalTableScan(table=[[mytable]])
> {noformat}
> When it's generating {{(1)}} from {{SqlToRelConverter}}, Calcite realizes it 
> has an opportunity to squash {{(1)}} and {{(2)}} together. It will keep 
> {{(1)}} and delete {{(2)}}. But to do that it will first have to replace 
> references from {{(1)}} and rewrite them to push past the parent project 
> {{(2)}} (See RelOptUtil#pushPastProject).
> It does this in steps, first it goes over the projected columns (in this case 
> only {{name}}) then over any qualify expressions.
> The rough logic is straightforward. It takes {{name=[$0]}} from {{(1)}} and 
> replaces that with whatever sits on index {{$0}} on {{(2)}}. So {{name=[$0]}} 
> becomes {{name=[$5]}}. Then it removes {{(2)}} from the query plan.
> At this point we have this:
> {noformat}
> (1)  LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER 
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> (3)      LogicalTableScan(table=[[mytable]])
> {noformat}
> Now Calcite will do the same for the {{QualifyExpression}} which still has 
> the wrong indexes. This is where it goes wrong. It eagerly removed {{(2)}} 
> from the plan so there's no way for it to even know it there was another 
> projection in there. It won't do anything and the indexes for 
> {{QualifyExpression}} are now all wrong. 
> More specifically right in this section inside {{RelBuilder#Project}}, 
> {{frame.rel}} will not be an instance of {{Project}} because we just removed 
> that project (it was {{(2)}}). It will be an instance of {{TableScan}} at 
> this point.
> {code:java}
> // Do not merge projection when top projection has correlation variables
>     bloat:
>     if (frame.rel instanceof Project
>         && config.bloat() >= 0
>         && variables.isEmpty()) {
>       final Project project = (Project) frame.rel;
> {code}
> ----
> I suppose the fix would be to delay removing the parent {{Project}} from the 
> plan to after we've processed the entire {{Project}}. But I don't know that 
> much about the project to try it. I'm happy to give it a shot if any of the 
> maintainers point me in the right direction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to