alamb commented on issue #9450: URL: https://github.com/apache/arrow-datafusion/issues/9450#issuecomment-1976435125
I think the issue is that the partitioning of the test is not deterministic `PARTITION BY t1.column` has all the same values. ```sql # generate BIGINT data from 1 to 1000 in multiple partitions statement ok CREATE TABLE t1000 (i BIGINT) AS WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3; # verify that there are multiple partitions in the input (i.e. MemoryExec says # there are 4 partitions) so that this tests multi-partition limit. query TT EXPLAIN SELECT DISTINCT i FROM t1000; ---- logical_plan Aggregate: groupBy=[[t1000.i]], aggr=[[]] --TableScan: t1000 projection=[i] physical_plan AggregateExec: mode=FinalPartitioned, gby=[i@0 as i], aggr=[] --CoalesceBatchesExec: target_batch_size=8192 ----RepartitionExec: partitioning=Hash([i@0], 4), input_partitions=4 ------AggregateExec: mode=Partial, gby=[i@0 as i], aggr=[] --------MemoryExec: partitions=4, partition_sizes=[1, 2, 1, 1] ``` I think we can fix this by changing the test to use different values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
