kosiew commented on issue #1131:
URL:
https://github.com/apache/datafusion-python/issues/1131#issuecomment-2903100506
hi @l1t1 ,
duckdb too goes into an infinite loop:
```
❯ duckdb
v1.1.3 19864453f7
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D WITH RECURSIVE t(i, lv) AS (
SELECT generate_series AS i, 1 AS lv
FROM generate_series(1, 6, 1)
UNION ALL
SELECT MAX(i), MAX(lv) + 1
FROM t
WHERE lv < 2
)
SELECT *
FROM t
WHERE lv = 2;
```
I think this isn’t a bug in DataFusion-Python so much as a quirk of SQL’s
recursive-CTE semantics when you use UNION ALL with an aggregate that never
changes its output. Any engine that follows the SQL standard will do the same.
DuckDB likewise spins forever because the recursive member keeps re-emitting
the same row, and UNION ALL does not remove duplicates, so the CTE never
reaches a fixpoint.
Amending the query to UNION does complete in duckdb:
```
❯ duckdb
v1.1.3 19864453f7
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D WITH RECURSIVE t(i, lv) AS (
SELECT generate_series AS i, 1 AS lv
FROM generate_series(1, 6, 1)
UNION
SELECT MAX(i), MAX(lv) + 1
FROM t
WHERE lv < 2
)
SELECT *
FROM t
WHERE lv = 2;
┌───────┬───────┐
│ i │ lv │
│ int64 │ int32 │
├───────┼───────┤
│ 6 │ 2 │
└───────┴───────┘
```
But the same query is not yet supported in Datafusion:
```
>>> sql = """
... with recursive t as(select value i,1 lv from generate_series(1,6,1)
... union
... select max(i),max(lv)+1 from t where lv<2)
... select * from t where lv=2;
... """
>>> df = ctx.sql(sql)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/kosiew/GitHub/datafusion-python/python/datafusion/context.py", line
589, in sql
return DataFrame(self.ctx.sql(query))
^^^^^^^^^^^^^^^^^^^
Exception: DataFusion error: NotImplemented("Recursive queries with a
distinct 'UNION' (in which the previous iteration's results will be
de-duplicated) is not supported")
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]