jccampagne opened a new issue, #22093:
URL: https://github.com/apache/datafusion/issues/22093

   ### Describe the bug
   
   
   I stumbled a potential issue in DataFusion while testing recursive CTEs in 
my project:
   
   Here's such a query:
   
   ```sql
   WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' FROM chain WHERE length(level) < 5
   )
   select * from chain
   ```
   
   TLDR: the issue is:
   - Anchor member `SELECT 'A' AS level` has field name `level`
   - the recursive part `SELECT level || '>' ...` has field name `chain.level 
|| Utf8(\">\")`
   
   After a bit of digging and debugging I was able to reproduce the error in 
the CLI.
   I also tested in Postgres for reference (see below).
   
   ## Exhibit error in CLI with datafusion-cli
   
   The issue is also apparent in the CLI. Using Datafusion 53.1.0.
   
   At first it seems ok in the CLI:
   ```sql
   % datafusion-cli
   DataFusion CLI v53.1.0
   > WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' FROM chain WHERE length(level) < 5
   )
   SELECT * FROM chain
   ;
   +-------+
   | level |
   +-------+
   | A     |
   | A>    |
   | A>>   |
   | A>>>  |
   | A>>>> |
   +-------+
   5 row(s) fetched.
   Elapsed 0.005 seconds.
   ```
   
   This works fine, maybe the pretty printer is lenient and is silent or 
ignores the different schemas.
   However if you try to create the table using `CREATE TABLE ... AS ...`, the 
error is apparent:
   ```sql
   > create table test as ( WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' FROM chain WHERE length(level) < 5
   )
   SELECT * FROM chain );
   Error during planning: Mismatch between schema and batches
   ```
   
   It fails.
   
   
   ## Debug in code:
   
   Adding some debugging shows the value of the returned record batches  
(without `as level` for the recursive case:
   ```rust
   [
   RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: 
Utf8 }], metadata: {} }, columns: [StringArray [ "A", ]], row_count: 1 },
   RecordBatch { schema: Schema { fields: [Field { name: "chain.level || 
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>", 
]], row_count: 1 },
   RecordBatch { schema: Schema { fields: [Field { name: "chain.level || 
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ 
"A>>", ]], row_count: 1 },
   RecordBatch { schema: Schema { fields: [Field { name: "chain.level || 
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ 
"A>>>", ]], row_count: 1 },
   RecordBatch { schema: Schema { fields: [Field { name: "chain.level || 
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ 
"A>>>>", ]], row_count: 1 }
   ]
   ```
   
   As you can see the field names in the schema different in the anchor member 
and the recursive case.
   
   With the "... as level" in the recursive case, the record batches are:
   ```rust
   [RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: 
Utf8 }], metadata: {} }, columns: [StringArray [ "A", ]], row_count: 1 },
    RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: 
Utf8 }], metadata: {} }, columns: [StringArray [ "A>", ]], row_count: 1 },
    RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: 
Utf8 }], metadata: {} }, columns: [StringArray [ "A>>", ]], row_count: 1 },
    RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: 
Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>", ]], row_count: 1 },
    RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: 
Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>>",]], row_count: 1 }]
   ```
   
   
   ## Compatibility check with Postgres
   
   I tested with Postgres to check compatibility, it works fine (with or 
without the `CREATE TABLE`):
   ```sql
   % psql
   psql (14.22 (Homebrew))
   Type "help" for help.
   
   jc=# WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' FROM chain WHERE length(level) < 5
   )
   select * from chain;
    level
   -------
    A
    A>
    A>>
    A>>>
    A>>>>
   (5 rows)
   
   jc=# create table test as ( WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' FROM chain WHERE length(level) < 5
   )
   SELECT * FROM chain );
   SELECT 5
   jc=# select * from test;
    level
   -------
    A
    A>
    A>>
    A>>>
    A>>>>
   (5 rows)
   ```
   
   
   ### To Reproduce
   
   Run this in Datafusion CLI:
   
   ```
   create table test as ( WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' FROM chain WHERE length(level) < 5
   )
   SELECT * FROM chain );
   ```
   It should fail with `Error during planning: Mismatch between schema and 
batches`
   
   
   ### Expected behavior
   
   It should behave like:
   
   ```
   > create table test as ( WITH RECURSIVE chain AS (
     SELECT 'A' AS level
     UNION ALL
     SELECT level || '>' as level FROM chain WHERE length(level) < 5
   )
   SELECT * FROM chain );
   0 row(s) fetched.
   Elapsed 0.003 seconds.
   
   > select * from test;
   +-------+
   | level |
   +-------+
   | A     |
   | A>    |
   | A>>   |
   | A>>>  |
   | A>>>> |
   +-------+
   5 row(s) fetched.
   Elapsed 0.004 seconds.
   ```
   
   note the `... as level ...` in `SELECT level || '>' as level FROM chain ... `
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to