alamb commented on issue #9450:
URL:
https://github.com/apache/arrow-datafusion/issues/9450#issuecomment-1976453784
So I think the problem is that the input is hash partitioned into 4
partitions but somehow one of the partitions gets two batches and which
partition gets the two batches is non deterministic
```
explain CREATE TABLE t1000 (i BIGINT) AS
WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0))
SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3;
----
logical_plan
CreateMemoryTable: Bare { table: "t1000" }
--Projection: CAST(ROW_NUMBER() PARTITION BY [t1.column1] ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS Int64) AS i
----WindowAggr: windowExpr=[[ROW_NUMBER() PARTITION BY [t1.column1] ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
------CrossJoin:
--------CrossJoin:
----------SubqueryAlias: t1
------------SubqueryAlias: t
--------------Values: (Int64(0)), (Int64(0)), (Int64(0)), (Int64(0)),
(Int64(0))...
----------SubqueryAlias: t2
------------SubqueryAlias: t
--------------Projection:
----------------Values: (Int64(0)), (Int64(0)), (Int64(0)), (Int64(0)),
(Int64(0))...
--------SubqueryAlias: t3
----------SubqueryAlias: t
------------Projection:
--------------Values: (Int64(0)), (Int64(0)), (Int64(0)), (Int64(0)),
(Int64(0))...
```
Another way to fix the issue might be to add a configuration option such as
`datafusion.explain.show_statistics` that would control if the
`partition_sizes` were output in explain plan.
Something like
```sql
set datafusion.explain.show_sizes = false;
```
And then the MemoryExec output would be generated without partition_sizesL
```
MemoryExec: partitions=4
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]