[jira] [Updated] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

Brachi Packter (Jira) Wed, 10 Apr 2024 05:22:36 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brachi Packter updated CALCITE-6357:
------------------------------------
    Description: 
Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
to schema fields, if no, it creates a project with fields as they appear in the 
select , meaning if they have aliases, they are returning with their aliases.

Here, it checks if they are identical:

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063

using RexUtil.isIdentity method:

```
 public static boolean isIdentity(List<? extends RexNode> exps,
      RelDataType inputRowType) {
    return inputRowType.getFieldCount() == exps.size()
        && containIdentity(exps, inputRowType, Litmus.IGNORE);
  }
```
This is the problematic part `inputRowType.getFieldCount() == exps.size()`

If they are identical, and return with their aliases, it is ignored in the 
"rename" method later on
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125

and alias is skipped

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137

This doesn't impact calcite queries, but in Apache Beam they are doing some 
optimization on top of it, 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
which causes aliases to be ignored, and data is returning suddenly without 
correct column field.

I believe the isIdentity check can causes more issues if not fixed, we need to 
understand why is it enforced? isn't it valid to have different size of fields 
in select from what we have in the schema?

In our case we have a one big row and we run on it different queries, each with 
different fields in the select.

Beam issue 
https://github.com/apache/beam/issues/30498 

  was:
Calcite RelBuilder.ProjectNamed cehcks if row size in the select is identical 
to schema fields, if no, it creates a project with fields as they appear in the 
select , meaning if they have aliases, they are returning with their aliases.

Here it checks if they are identical:

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063

using RexUtil.isIdentity method:

```
 public static boolean isIdentity(List<? extends RexNode> exps,
      RelDataType inputRowType) {
    return inputRowType.getFieldCount() == exps.size()
        && containIdentity(exps, inputRowType, Litmus.IGNORE);
  }
```
This is the problematic part `inputRowType.getFieldCount() == exps.size()`

And then it is ignored in the "rename" method later on
https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125

and alias is skipped

https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137

This doesn't impact calcite queries, but in Apache Beam they are doing some 
optimization on top of it, 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
which cause aliases to be ignored, and data is returning suddenly without 
correct column field.

I believe the isIdentity check can causes more issues if not fixed, we need to 
understand why is it enforced? isn't it valid to have different size of fields 
in select from what we have in the schema?

In our case we have a one big row and we run on it different queries, each with 
different fields in the select.

Beam issue 
https://github.com/apache/beam/issues/30498 


> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6357
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6357
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Brachi Packter
>            Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List<? extends RexNode> exps,
>       RelDataType inputRowType) {
>     return inputRowType.getFieldCount() == exps.size()
>         && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

Reply via email to