alamb commented on issue #9450:
URL: 
https://github.com/apache/arrow-datafusion/issues/9450#issuecomment-1976435125

   I think the issue is that the partitioning of the test is not deterministic 
`PARTITION BY t1.column` has all the same values. 
   
   ```sql
   
   
   # generate BIGINT data from 1 to 1000 in multiple partitions
   statement ok
   CREATE TABLE t1000 (i BIGINT) AS
   WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0))
   SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3;
   
   # verify that there are multiple partitions in the input (i.e. MemoryExec 
says
   # there are 4 partitions) so that this tests multi-partition limit.
   query TT
   EXPLAIN SELECT DISTINCT i FROM t1000;
   ----
   logical_plan
   Aggregate: groupBy=[[t1000.i]], aggr=[[]]
   --TableScan: t1000 projection=[i]
   physical_plan
   AggregateExec: mode=FinalPartitioned, gby=[i@0 as i], aggr=[]
   --CoalesceBatchesExec: target_batch_size=8192
   ----RepartitionExec: partitioning=Hash([i@0], 4), input_partitions=4
   ------AggregateExec: mode=Partial, gby=[i@0 as i], aggr=[]
   --------MemoryExec: partitions=4, partition_sizes=[1, 2, 1, 1]
   ```
   
   I think we can fix this by changing the test to use different values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to