alamb commented on issue #9450:
URL: 
https://github.com/apache/arrow-datafusion/issues/9450#issuecomment-1976453784

   So I think the problem is that the input is hash partitioned into 4 
partitions but somehow one of the partitions gets two batches and which 
partition gets the two batches is non deterministic
   
   ```
   explain CREATE TABLE t1000 (i BIGINT) AS
   WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0))
   SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3;
   ----
   logical_plan
   CreateMemoryTable: Bare { table: "t1000" }
   --Projection: CAST(ROW_NUMBER() PARTITION BY [t1.column1] ROWS BETWEEN 
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS Int64) AS i
   ----WindowAggr: windowExpr=[[ROW_NUMBER() PARTITION BY [t1.column1] ROWS 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]]
   ------CrossJoin:
   --------CrossJoin:
   ----------SubqueryAlias: t1
   ------------SubqueryAlias: t
   --------------Values: (Int64(0)), (Int64(0)), (Int64(0)), (Int64(0)), 
(Int64(0))...
   ----------SubqueryAlias: t2
   ------------SubqueryAlias: t
   --------------Projection: 
   ----------------Values: (Int64(0)), (Int64(0)), (Int64(0)), (Int64(0)), 
(Int64(0))...
   --------SubqueryAlias: t3
   ----------SubqueryAlias: t
   ------------Projection: 
   --------------Values: (Int64(0)), (Int64(0)), (Int64(0)), (Int64(0)), 
(Int64(0))...
   ```
   
   Another way to fix the issue might be to add a configuration option such as 
`datafusion.explain.show_statistics` that would control if the 
`partition_sizes` were output in explain plan.
   
   Something like
   ```sql
   set datafusion.explain.show_sizes = false;
   ```
   
   And then the  MemoryExec output would be generated without partition_sizesL
   
   ```
   MemoryExec: partitions=4
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to