[jira] [Commented] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

Sergey Nuyanzin (Jira) Mon, 16 Oct 2023 06:02:15 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-33217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775748#comment-17775748
 ]


Sergey Nuyanzin commented on FLINK-33217:
-----------------------------------------

after looking a bit deeper it looks like it is not Calcite issue
e.g.

this query

{code:sql}
with Book as (
   SELECT *
     FROM
          (
        VALUES
              ROW (array[1, 2])
            , ROW (array[11])
            , ROW (array[22])
        )  Book (authorId)
        )  select * from Book b left join unnest(b.authorId) on true
{codei
works ok with Calcite while is failing on Flink with similar issue.

I think the reason is that  In Calcite there is a dedicated {{UNNEST}} operator 
[1] which for some reason is not used in Flink... Instead there is 
LogicalUnnestRule [2] which tries to translates result of unnest as a table 
scan and this is the place where the error happens... 
Based on the code of this rule 
{code:scala}
      relNode match {
        case rs: HepRelVertex =>
          convert(getRel(rs))

          case f: LogicalProject =>
           ...
          case f: LogicalFilter =>
           ...

          case uc: Uncollect =>
          ...
    }
{code}
there could be  4 different types of cases failing with same or similar error 
while join unnest.

Current thoughts about how to fix it 
1. Move to Calcite's Unnest operator (however that's still not clear what was 
the reason to not use it...)
2. Since while parsing and building AST and while also convertion Calcite 
converts {{LEFT JOIN}} to something that has nullable type pn the left and this 
is also the reason, we could add convertion to not do it for {{LEFT JOIN 
UNNEST}}
3. We could try handling this in {{LogicalUnnestRule}} by making types broader 
like force nullables... however it could lead wrong final types (e.g. nullable 
instead of not nullable)

[1] 
https://github.com/apache/calcite/blob/bf56743554ea27d250d41db2eb83806f9e626b55/core/src/main/java/org/apache/calcite/sql/SqlUnnestOperator.java#L34
[2] 
https://github.com/apache/flink/blob/91d81c427aa6312841ca868d54e8ce6ea721cd60/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/rules/logical/LogicalUnnestRule.scala#L52

> Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array
> ---------------------------------------------------------------------
>
>                 Key: FLINK-33217
>                 URL: https://issues.apache.org/jira/browse/FLINK-33217
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.15.3, 1.18.0, 1.19.0
>            Reporter: Robert Metzger
>            Priority: Major
>         Attachments: UnnestNullErrorTest.scala
>
>
> Steps to reproduce:
> Take a column of type 
> {code:java}
> business_data ARRAY<STRING NOT NULL>
> {code}
> Take this query
> {code:java}
> select bd_name from reproduce_unnest LEFT JOIN 
> UNNEST(reproduce_unnest.business_data) AS exploded_bd(bd_name) ON true
> {code}
> And get this error
> {code:java}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of rel before registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" bd_name) NOT NULL
> rowtype of rel after registration: RecordType(VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL ARRAY business_data, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" NOT NULL f0) NOT NULL
> Difference:
> bd_name: VARCHAR(2147483647) CHARACTER SET "UTF-16LE" -> VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL
>       at org.apache.calcite.util.Litmus$1.fail(Litmus.java:32)
>       at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2206)
>       at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:275)
>       at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1270)
>       at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:598)
>       at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:613)
>       at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:498)
>       at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:315)
>       at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:62)
> {code}
> I have implemented a small test case, which fails against Flink 1.15, 1.8 and 
> the latest master branch.
> Workarounds:
> 1. Drop "NOT NULL" in array type
> 2. Drop "LEFT" from "LEFT JOIN".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33217) Flink SQL: UNNEST fails with on LEFT JOIN with NOT NULL type in array

Reply via email to