Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]

via GitHub Mon, 08 Apr 2024 04:49:05 -0700


brachipa commented on issue #30498:
URL: https://github.com/apache/beam/issues/30498#issuecomment-2042540303


   Ok, I think I find what cause it. calcite checks if expression node is equal 
to row fields
   
https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2068C5-L2069C63
   using RexUtil.isIdentity method: 
   ```
    public static boolean isIdentity(List<? extends RexNode> exps,
         RelDataType inputRowType) {
       return inputRowType.getFieldCount() == exps.size()
           && containIdentity(exps, inputRowType, Litmus.IGNORE);
     }
   ```
   
   In the failing example, we have row with wider schema from what we actually 
select. most of our schemas has much more data from what we select, identifying 
the row as not identical row causes it to create a project with fields as they 
appear in the select , meaning with their alias and not with their origin field 
name.
   
   And then it is ignored in the "rename" method later on
   
https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2130
   
   and alias is skipped
   
   
https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2142
   
   I believe the `isIdentity` check can cause more issues, and we must 
understand why this is enforced? isn't it valid to have different size of 
fields in select from what we have in the schema?
   
   In our case we have one big row and we run on it different queries, each 
with different fields in the select.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Bug]: Beam Sql is ignoring aliases fields in some situations which causes to huge data loss [beam]

Reply via email to