asubiotto commented on code in PR #16985: URL: https://github.com/apache/datafusion/pull/16985#discussion_r2244811476
########## datafusion/physical-plan/src/unnest.rs: ########## @@ -101,8 +101,22 @@ impl UnnestExec { input: &Arc<dyn ExecutionPlan>, schema: SchemaRef, ) -> PlanProperties { + // Extract equivalence properties from input plan + let input_eq_properties = input.equivalence_properties(); + let input_oeq_class = input_eq_properties.oeq_class(); + let orderings = input_oeq_class.orderings().to_vec(); + let eq_group = input_eq_properties.eq_group(); + let constraints = input_eq_properties.constraints(); + Review Comment: I think rather than creating a new equivalence properties object and trying to set all known fields, I would follow an approach similar to what filter does: https://github.com/apache/datafusion/blob/4bc66c80d560581f527dee5774fb4f0479786d3e/datafusion/physical-plan/src/filter.rs#L267 Which basically clones the full input equivalence properties and then modifies it. In this case I believe we should just remove the orderings on the columns we're unnesting. I'm not very familiar with this code so not sure how to express this exactly. ########## datafusion/sqllogictest/test_files/unnest.slt: ########## @@ -941,3 +941,60 @@ where min_height * width1 = ( ) ---- 4 7 4 28 + +## Unnest with ordering on unrelated column is preserved +query TT +EXPLAIN WITH unnested AS (SELECT + ROW_NUMBER() OVER () AS generated_id, + unnest(array[value]) as ar + FROM range(1,5)) SELECT array_agg(ar) FROM unnested group by generated_id; +---- +logical_plan +01)Projection: array_agg(unnested.ar) +02)--Aggregate: groupBy=[[unnested.generated_id]], aggr=[[array_agg(unnested.ar)]] +03)----SubqueryAlias: unnested +04)------Projection: generated_id, __unnest_placeholder(make_array(range().value),depth=1) AS UNNEST(make_array(range().value)) AS ar +05)--------Unnest: lists[__unnest_placeholder(make_array(range().value))|depth=1] structs[] +06)----------Projection: row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS generated_id, make_array(range().value) AS __unnest_placeholder(make_array(range().value)) +07)------------WindowAggr: windowExpr=[[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING]] +08)--------------TableScan: range() projection=[value] +physical_plan +01)ProjectionExec: expr=[array_agg(unnested.ar)@1 as array_agg(unnested.ar)] +02)--AggregateExec: mode=FinalPartitioned, gby=[generated_id@0 as generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted +03)----SortExec: expr=[generated_id@0 ASC NULLS LAST], preserve_partitioning=[true] +04)------CoalesceBatchesExec: target_batch_size=8192 +05)--------RepartitionExec: partitioning=Hash([generated_id@0], 4), input_partitions=4 +06)----------AggregateExec: mode=Partial, gby=[generated_id@0 as generated_id], aggr=[array_agg(unnested.ar)], ordering_mode=Sorted +07)------------ProjectionExec: expr=[generated_id@0 as generated_id, __unnest_placeholder(make_array(range().value),depth=1)@1 as ar] +08)--------------UnnestExec +09)----------------ProjectionExec: expr=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING@1 as generated_id, make_array(value@0) as __unnest_placeholder(make_array(range().value))] +10)------------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 +11)--------------------BoundedWindowAggExec: wdw=[row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { name: "row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING", data_type: UInt64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted] +12)----------------------LazyMemoryExec: partitions=1, batch_generators=[range: start=1, end=5, batch_size=8192] + +## Unnest without ordering +query TT +EXPLAIN WITH unnested AS (SELECT + random() AS generated_id, + unnest(array[value]) as ar + FROM range(1,5)) SELECT array_agg(ar) FROM unnested group by generated_id; Review Comment: I think it's fine to only have a single EXPLAIN test case for the ordering -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org