brachipa commented on issue #30498: URL: https://github.com/apache/beam/issues/30498#issuecomment-2042540303
Ok, I think I find what cause it. calcite checks if expression node is equal to row fields https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2068C5-L2069C63 using RexUtil.isIdentity method: ``` public static boolean isIdentity(List<? extends RexNode> exps, RelDataType inputRowType) { return inputRowType.getFieldCount() == exps.size() && containIdentity(exps, inputRowType, Litmus.IGNORE); } ``` In the failing example, we have row with wider schema from what we actually select. most of our schemas has much more data from what we select, identifying the row as not identical row causes it to create a project with fields as they appear in the select , meaning with their alias and not with their origin field name. And then it is ignored in the "rename" method later on https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2130 and alias is skipped https://github.com/apache/calcite/blob/dec167ac18272c0cd8be477d6b162d7a31a62114/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2142 I believe the `isIdentity` check can cause more issues, and we must understand why this is enforced? isn't it valid to have different size of fields in select from what we have in the schema? In our case we have one big row and we run on it different queries, each with different fields in the select. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org