alamb commented on code in PR #16985:
URL: https://github.com/apache/datafusion/pull/16985#discussion_r2442156539
##########
datafusion/sqllogictest/test_files/unnest.slt:
##########
@@ -941,3 +941,242 @@ where min_height * width1 = (
)
----
4 7 4 28
+
+## Unnest with ordering on unrelated column is preserved
+query TT
+EXPLAIN WITH unnested AS (SELECT
+ ROW_NUMBER() OVER () AS generated_id,
+ unnest(array[value]) as ar
+ FROM range(1,5)) SELECT array_agg(ar) FROM unnested group by generated_id;
+----
+logical_plan
+01)Projection: array_agg(unnested.ar)
+02)--Aggregate: groupBy=[[unnested.generated_id]],
aggr=[[array_agg(unnested.ar)]]
+03)----SubqueryAlias: unnested
+04)------Projection: generated_id,
__unnest_placeholder(make_array(range().value),depth=1) AS
UNNEST(make_array(range().value)) AS ar
+05)--------Unnest:
lists[__unnest_placeholder(make_array(range().value))|depth=1] structs[]
+06)----------Projection: row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING AS generated_id, make_array(range().value) AS
__unnest_placeholder(make_array(range().value))
+07)------------WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING]]
+08)--------------TableScan: range() projection=[value]
+physical_plan
+01)ProjectionExec: expr=[array_agg(unnested.ar)@1 as array_agg(unnested.ar)]
+02)--AggregateExec: mode=FinalPartitioned, gby=[generated_id@0 as
generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted
+03)----SortExec: expr=[generated_id@0 ASC NULLS LAST],
preserve_partitioning=[true]
Review Comment:
Actually, I spent some more time looking and I think your code is working as
expected
Namely, Note the `ordering_mode=Sorted` that is above the `UnnestExec` -
that means that the `GROUP BY` columns (in this case `generated_id`) are
sorted, as expected.
```
06)----------AggregateExec: mode=Partial, gby=[generated_id@0 as
generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted
07)------------ProjectionExec: expr=[generated_id@0 as generated_id,
__unnest_placeholder(make_array(range().value),depth=1)@1 as ar]
08)--------------UnnestExec
09)----------------ProjectionExec: expr=[row_number() ROWS BETWEEN UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING@1 as generated_id, make_array(value@0) as
__unnest_placeholder(make_array(range().value))]
10)------------------RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1
11)--------------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { name: "row_number() ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: UInt64,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
```
The reason there is a sort in the new plan is that the optimizer has
decided to repartition the intermediate aggregate result (unrelated to this PR)
```
03)----SortExec: expr=[generated_id@0 ASC NULLS LAST],
preserve_partitioning=[true]
04)------CoalesceBatchesExec: target_batch_size=8192
05)--------RepartitionExec: partitioning=Hash([generated_id@0], 4),
input_partitions=4
06)----------AggregateExec: mode=Partial, gby=[generated_id@0 as
generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]