jccampagne opened a new issue, #22093:
URL: https://github.com/apache/datafusion/issues/22093
### Describe the bug
I stumbled a potential issue in DataFusion while testing recursive CTEs in
my project:
Here's such a query:
```sql
WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
select * from chain
```
TLDR: the issue is:
- Anchor member `SELECT 'A' AS level` has field name `level`
- the recursive part `SELECT level || '>' ...` has field name `chain.level
|| Utf8(\">\")`
After a bit of digging and debugging I was able to reproduce the error in
the CLI.
I also tested in Postgres for reference (see below).
## Exhibit error in CLI with datafusion-cli
The issue is also apparent in the CLI. Using Datafusion 53.1.0.
At first it seems ok in the CLI:
```sql
% datafusion-cli
DataFusion CLI v53.1.0
> WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain
;
+-------+
| level |
+-------+
| A |
| A> |
| A>> |
| A>>> |
| A>>>> |
+-------+
5 row(s) fetched.
Elapsed 0.005 seconds.
```
This works fine, maybe the pretty printer is lenient and is silent or
ignores the different schemas.
However if you try to create the table using `CREATE TABLE ... AS ...`, the
error is apparent:
```sql
> create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
Error during planning: Mismatch between schema and batches
```
It fails.
## Debug in code:
Adding some debugging shows the value of the returned record batches
(without `as level` for the recursive case:
```rust
[
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type:
Utf8 }], metadata: {} }, columns: [StringArray [ "A", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level ||
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>",
]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level ||
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [
"A>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level ||
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [
"A>>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level ||
Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [
"A>>>>", ]], row_count: 1 }
]
```
As you can see the field names in the schema different in the anchor member
and the recursive case.
With the "... as level" in the recursive case, the record batches are:
```rust
[RecordBatch { schema: Schema { fields: [Field { name: "level", data_type:
Utf8 }], metadata: {} }, columns: [StringArray [ "A", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type:
Utf8 }], metadata: {} }, columns: [StringArray [ "A>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type:
Utf8 }], metadata: {} }, columns: [StringArray [ "A>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type:
Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type:
Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>>",]], row_count: 1 }]
```
## Compatibility check with Postgres
I tested with Postgres to check compatibility, it works fine (with or
without the `CREATE TABLE`):
```sql
% psql
psql (14.22 (Homebrew))
Type "help" for help.
jc=# WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
select * from chain;
level
-------
A
A>
A>>
A>>>
A>>>>
(5 rows)
jc=# create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
SELECT 5
jc=# select * from test;
level
-------
A
A>
A>>
A>>>
A>>>>
(5 rows)
```
### To Reproduce
Run this in Datafusion CLI:
```
create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
```
It should fail with `Error during planning: Mismatch between schema and
batches`
### Expected behavior
It should behave like:
```
> create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' as level FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
0 row(s) fetched.
Elapsed 0.003 seconds.
> select * from test;
+-------+
| level |
+-------+
| A |
| A> |
| A>> |
| A>>> |
| A>>>> |
+-------+
5 row(s) fetched.
Elapsed 0.004 seconds.
```
note the `... as level ...` in `SELECT level || '>' as level FROM chain ... `
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]