Lino Rosa created CALCITE-6691:
----------------------------------

             Summary: Qualify expressions not updated when merging two Project
                 Key: CALCITE-6691
                 URL: https://issues.apache.org/jira/browse/CALCITE-6691
             Project: Calcite
          Issue Type: Bug
            Reporter: Lino Rosa


Given a query like the below
{code:sql}
WITH t0 AS (
  SELECT
    name,
    salary
    FROM mytable
),
t1 AS (
  SELECT
    t0.name
    FROM t0
    qualify row_number() over (partition by t0.name order by t0.salary desc) = 1
)
select name from t1
{code}
We end up with the *wrong* query plan below:
{noformat}
[Logical plan]
LogicalProject(name=[$0])
  LogicalFilter(condition=[$1])
    LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION 
BY $0 ORDER BY $1 DESC), 1)])
      LogicalTableScan(table=[[mytable]])
{noformat}

Under {{PARTITION BY $0}}, the {{$0}} should be referencing {{t0.name}} which 
on the select list on the same node is {{name=[$5]}}. So something is off.

----

After some debugging I can trace back the issue.

Here is the query plan *unsquashed*:


{noformat}
[Logical plan]
LogicalProject(name=[$0])
  LogicalProject(name=[$0])
    LogicalFilter(condition=[$1])
      LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER 
(PARTITION BY $0 ORDER BY $1 DESC), 1)])
        LogicalProject(name=[$5], salary=[$6])
          LogicalTableScan(table=[[mytable]])
{noformat}

And here it is *squashed*:


{noformat}
[Logical plan]
LogicalProject(name=[$0])
  LogicalFilter(condition=[$1])
    LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER (PARTITION 
BY $0 ORDER BY $1 DESC), 1)])
      LogicalTableScan(table=[[mytable]])
{noformat}

This is the relevant section before squashing:

{noformat}
(1)  LogicalProject(name=[$0], QualifyExpression=[=(ROW_NUMBER() OVER 
(PARTITION BY $0 ORDER BY $1 DESC), 1)])
(2)    LogicalProject(name=[$5], salary=[$6])
(3)      LogicalTableScan(table=[[mytable]])
{noformat}

When it's generating {{(1)}} from {{SqlToRelConverter}}, Calcite realizes it 
has an opportunity to squash {{(1)}} and {{(2)}} together. It will keep {{(1)}} 
and delete {{(2)}}. But to do that it will first have to replace references 
from {{(1)}} and rewrite them to push past the parent project {{(2)}} (See 
RelOptUtil#pushPastProject).

It does this in steps, first it goes over the projected columns (in this case 
only {{name}}) then over any qualify expressions.

The rough logic is straightforward. It takes {{name=[$0]}} from {{(1)}} and 
replaces that with whatever sits on index {{$0}} on {{(2)}}. So {{name=[$0]}} 
becomes {{name=[$5]}}. Then it removes {{(2)}} from the query plan.

At this point we have this:


{noformat}
(1)  LogicalProject(name=[$5], QualifyExpression=[=(ROW_NUMBER() OVER 
(PARTITION BY $0 ORDER BY $1 DESC), 1)])
(3)      LogicalTableScan(table=[[mytable]])
{noformat}

Now Calcite will do the same for the {{QualifyExpression}} which still has the 
wrong indexes. This is where it goes wrong. It eagerly removed {{(2)}} from the 
plan so there's no way for it to even know it there was another projection in 
there. It won't do anything and the indexes for {{QualifyExpression}} are now 
all wrong. 

More specifically right in this section inside {{RelBuilder#Project}}, 
{{frame.rel}} will not be an instance of {{Project}} because we just removed 
that project (it was {{(2)}}). It will be an instance of {{TableScan}} at this 
point.

{code:java}
// Do not merge projection when top projection has correlation variables
    bloat:
    if (frame.rel instanceof Project
        && config.bloat() >= 0
        && variables.isEmpty()) {
      final Project project = (Project) frame.rel;
{code}

----

I suppose the fix would be to delay removing the parent {{Project}} from the 
plan to after we've processed the entire {{Project}}. But I don't know that 
much about the project to try it. I'm happy to give it a shot if any of the 
maintainers point me in the right direction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to