chenkovsky commented on code in PR #16983:
URL: https://github.com/apache/datafusion/pull/16983#discussion_r2318987444
##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -7390,6 +7392,41 @@ query error Error during planning: ORDER BY and WITHIN
GROUP clauses cannot be u
SELECT array_agg(a_varchar order by a_varchar) WITHIN GROUP (ORDER BY
a_varchar)
FROM (VALUES ('a'), ('d'), ('c'), ('a')) t(a_varchar);
+statement ok
+SET datafusion.execution.target_partitions = 1;
+
+query TT
+EXPLAIN select * from (select 'id' as id union all select 'id' as id order by
id) group by grouping sets ((id), ());
+----
+logical_plan
+01)Projection: id
+02)--Aggregate: groupBy=[[GROUPING SETS ((id), ())]], aggr=[[]]
+03)----Union
+04)------Projection: Utf8("id") AS id
+05)--------EmptyRelation: rows=1
+06)------Projection: Utf8("id") AS id
+07)--------EmptyRelation: rows=1
+physical_plan
+01)ProjectionExec: expr=[id@0 as id]
+02)--AggregateExec: mode=FinalPartitioned, gby=[id@0 as id, __grouping_id@1 as
__grouping_id], aggr=[], ordering_mode=PartiallySorted([0])
+03)----CoalesceBatchesExec: target_batch_size=8192
+04)------RepartitionExec: partitioning=Hash([id@0, __grouping_id@1], 1),
input_partitions=2
Review Comment:
it has single partition, but multiple record batches. aggregation assumes
that records in same group are adjacent, but it's not true for this case.
repartition solves this problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]