[
https://issues.apache.org/jira/browse/CALCITE-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde resolved CALCITE-6691.
----------------------------------
Resolution: Fixed
Fixed in
[fae49bc|https://github.com/apache/calcite/commit/fae49bc06a9dac705fb266b7883d5f951cb4da57].
> Qualify expressions not updated when merging two Project
> --------------------------------------------------------
>
> Key: CALCITE-6691
> URL: https://issues.apache.org/jira/browse/CALCITE-6691
> Project: Calcite
> Issue Type: Bug
> Reporter: Lino Rosa
> Assignee: Julian Hyde
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.39.0
>
>
> Given a query like the below
> {code:sql}
> WITH t0 AS (
> SELECT
> name,
> salary
> FROM mytable
> ),
> t1 AS (
> SELECT
> t0.name
> FROM t0
> qualify row_number() over (partition by t0.name order by t0.salary desc)
> = 1
> )
> select name from t1
> {code}
> We end up with the *wrong* query plan below:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
> LogicalFilter(condition=[$1])
> LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> LogicalTableScan(table=[[mytable]])
> {noformat}
> Under {{PARTITION BY $0}}, the {{$0}} should be referencing {{t0.name}} which
> on the select list on the same node is {{name=[$5]}}. So something is off.
> ----
> After some debugging I can trace back the issue.
> Here is the query plan *unsquashed*:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
> LogicalProject(name=[$0])
> LogicalFilter(condition=[$1])
> LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> LogicalProject(name=[$5], salary=[$6])
> LogicalTableScan(table=[[mytable]])
> {noformat}
> And here it is *squashed*:
> {noformat}
> [Logical plan]
> LogicalProject(name=[$0])
> LogicalFilter(condition=[$1])
> LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> LogicalTableScan(table=[[mytable]])
> {noformat}
> This is the relevant section before squashing:
> {noformat}
> (1) LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> (2) LogicalProject(name=[$5], salary=[$6])
> (3) LogicalTableScan(table=[[mytable]])
> {noformat}
> When it's generating {{(1)}} from {{SqlToRelConverter}}, Calcite realizes it
> has an opportunity to squash {{(1)}} and {{(2)}} together. It will keep
> {{(1)}} and delete {{(2)}}. But to do that it will first have to replace
> references from {{(1)}} and rewrite them to push past the parent project
> {{(2)}} (See RelOptUtil#pushPastProject).
> It does this in steps, first it goes over the projected columns (in this case
> only {{name}}) then over any qualify expressions.
> The rough logic is straightforward. It takes {{name=[$0]}} from {{(1)}} and
> replaces that with whatever sits on index {{$0}} on {{(2)}}. So {{name=[$0]}}
> becomes {{name=[$5]}}. Then it removes {{(2)}} from the query plan.
> At this point we have this:
> {noformat}
> (1) LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER
> (PARTITION BY $0 ORDER BY $1 DESC), 1)])
> (3) LogicalTableScan(table=[[mytable]])
> {noformat}
> Now Calcite will do the same for the {{QualifyExpression}} which still has
> the wrong indexes. This is where it goes wrong. It eagerly removed {{(2)}}
> from the plan so there's no way for it to even know it there was another
> projection in there. It won't do anything and the indexes for
> {{QualifyExpression}} are now all wrong.
> More specifically right in this section inside {{RelBuilder#Project}},
> {{frame.rel}} will not be an instance of {{Project}} because we just removed
> that project (it was {{(2)}}). It will be an instance of {{TableScan}} at
> this point.
> {code:java}
> // Do not merge projection when top projection has correlation variables
> bloat:
> if (frame.rel instanceof Project
> && config.bloat() >= 0
> && variables.isEmpty()) {
> final Project project = (Project) frame.rel;
> {code}
> ----
> I suppose the fix would be to delay removing the parent {{Project}} from the
> plan to after we've processed the entire {{Project}}. But I don't know that
> much about the project to try it. I'm happy to give it a shot if any of the
> maintainers point me in the right direction.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)