[ 
https://issues.apache.org/jira/browse/FLINK-20539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036366#comment-18036366
 ] 

Sergey Nuyanzin commented on FLINK-20539:
-----------------------------------------

After looking at this issue one more time there are several 
observations/findings
1. {{ExtendedSqlRowTypeNameSpec}} uses {{FULLY_QUALIFIED}} which leads to a 
number of struct type mismatches
2. While finding {{leastRestrictive}} for struct types Calcite loses 
{{StructKind}} and sets it to {{FULLY_QUALIFIED}}, see CALCITE-5199
3. Calcite's {{SqlTypeUtil#convertTypeToSpec}} for rows uses 
{{SqlRowTypeNameSpec}} which leads again to  {{FULLY_QUALIFIED}}.
Also a couple of things about nullability
1. Calcite's {{SqlTypeUtil#convertTypeToSpec}} loses nullability, see 
CALCITE-7062
2. In {{FlinkRexBuilder#makeFieldAccess}} there is a special logic to adjust 
nullability for {{NON NULL}} nested fields if enclosing is nullable. The 
problem with this logic is that for very deeply nested structure it might 
generate lots of casts.

In last PR https://github.com/apache/flink/pull/27158 I addressed all these 
findings.
I played with all cases mentioned in this issue and in some related, so far 
none of them failed.

The downside of the fix is that (also put in release notes field)
{quote} Before that row types defined in SQL e.g. `SELECT CAST(f AS ROW<i NOT 
NULL>)` did ignore the `NOT NULL` constraint. This was more aligned with the 
SQL standard but caused many type inconsistencies and cryptic error message 
when working on nested data. For example, it prevented using rows in computed 
columns or join keys. The new behavior takes the nullability into 
consideration.{quote}

> Type mismatch when using ROW in computed column
> -----------------------------------------------
>
>                 Key: FLINK-20539
>                 URL: https://issues.apache.org/jira/browse/FLINK-20539
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API
>            Reporter: Timo Walther
>            Assignee: Sergey Nuyanzin
>            Priority: Major
>              Labels: auto-unassigned, pull-request-available
>             Fix For: 1.19.0, 1.18.2
>
>
> The following SQL:
> {code}
> env.executeSql(
>       "CREATE TABLE Orders (\n"
>       + "    order_number BIGINT,\n"
>       + "    price        INT,\n"
>       + "    first_name   STRING,\n"
>       + "    last_name    STRING,\n"
>       + "    buyer_name AS ROW(first_name, last_name)\n"
>       + ") WITH (\n"
>       + "  'connector' = 'datagen'\n"
>       + ")");
> env.executeSql("SELECT * FROM Orders").print();
> {code}
> Fails with:
> {code}
> Exception in thread "main" java.lang.AssertionError: Conversion to relational 
> algebra failed to preserve datatypes:
> validated type:
> RecordType(BIGINT order_number, INTEGER price, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" first_name, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> last_name, RecordType:peek_no_expand(VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" EXPR$0, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" EXPR$1) NOT 
> NULL buyer_name) NOT NULL
> converted type:
> RecordType(BIGINT order_number, INTEGER price, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" first_name, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> last_name, RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" EXPR$0, 
> VARCHAR(2147483647) CHARACTER SET "UTF-16LE" EXPR$1) NOT NULL buyer_name) NOT 
> NULL
> rel:
> LogicalProject(order_number=[$0], price=[$1], first_name=[$2], 
> last_name=[$3], buyer_name=[ROW($2, $3)])
>   LogicalTableScan(table=[[default_catalog, default_database, Orders]])
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.checkConvertedType(SqlToRelConverter.java:467)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:582)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to