alamb opened a new pull request, #20174: URL: https://github.com/apache/datafusion/pull/20174
## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/20173 ## Rationale for this change This is a regression we found in our unit tests. When upgrading to DataFusion 52 we hit a bug in our test cases where pre-sorted data was being resorted ## What changes are included in this PR? Fix the bug ## Are these changes tested? Yes I tried for a while with codex to get a .slt test purely via SQL but was not successful. To trigger the bug the output_ordering needs to not yet be projected before eq_properties() runs (that’s the only case the project_orderings(...) fix addresses). The SQL planner seems to always build output_ordering in the base schema, so the bug doesn’t manifest. Here is what I tried <details><summary>Details</summary> <p> ```sql # Create a table ordered by (a, b, c) using inline data. A filtered ORDER BY on b # should not introduce an extra SortExec after projection reorders the columns. statement ok COPY ( VALUES (1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 3), (2, 1, 1), (2, 1, 2), (2, 2, 1) ) TO 'test_files/scratch/order/ordered_abc/part-0.csv' STORED AS CSV; statement ok CREATE EXTERNAL TABLE ordered_abc ( a INT, b INT, c INT ) STORED AS CSV WITH ORDER (a ASC, b ASC, c ASC) LOCATION 'test_files/scratch/order/ordered_abc/' OPTIONS ('format.has_header' 'false'); query TT EXPLAIN SELECT b, a FROM ordered_abc WHERE a = 1 ORDER BY b; ---- logical_plan 01)Sort: ordered_abc.b ASC NULLS LAST 02)--Projection: ordered_abc.b, ordered_abc.a 03)----Filter: ordered_abc.a = Int32(1) 04)------TableScan: ordered_abc projection=[a, b], partial_filters=[ordered_abc.a = Int32(1)] physical_plan 01)SortPreservingMergeExec: [b@0 ASC NULLS LAST] 02)--ProjectionExec: expr=[b@1 as b, a@0 as a] 03)----FilterExec: a@0 = 1 04)------RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1, maintains_sort_order=true 05)--------DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/order/ordered_abc/part-0.csv]]}, projection=[a, b], output_ordering=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], file_type=csv, has_header=false statement ok drop table ordered_abc; ``` </p> </details> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
