zhuqi-lucas opened a new pull request, #22501: URL: https://github.com/apache/datafusion/pull/22501
## Which issue does this PR close? Cherry-pick of #22493 onto `branch-54`. ## Rationale for this change `branch-54` includes #21956 (`feat: globally reorder files and row groups by statistics for TopK queries`), which introduced a regression: for plain-column, multi-file scans where the on-disk file order does not match the declared sort order, `SortExec` was no longer eliminated even when stats-based reorder produced non-overlapping file groups whose declared ordering re-validated. #22493 restores the pre-#21956 sort-elimination behaviour by re-validating `output_ordering` after `rebuild_with_source` reorders files, and (per @adriangb's correctness follow-up) restoring the original hint-free `file_source` on the Inexact→Exact upgrade so leftover `reverse_row_groups` / `sort_order_for_reorder` hints don't mis-order row groups within a single file once the `SortExec` safety net is gone. ## What changes are included in this PR? Straight cherry-pick of merge commit `94c58d086`. Includes: - `FileScanConfig::try_pushdown_sort` Inexact arm: re-validate, upgrade to Exact (with file_source restore), guard with NULL safety + early-return - `rebuild_with_source`: `match (all_non_overlapping, is_exact)` decision table for keep_ordering - SLT updates restoring `SortExec` elimination expectations + Tests 5b/5c/8b for the NULL-safety and same-min row-group edge cases ## Are these changes tested? Cherry-picked cleanly (auto-merge in `sort_pushdown.rs`). `cargo build -p datafusion-datasource` — passes. `cargo test -p datafusion-sqllogictest --test sqllogictests -- sort_pushdown` — passes. ## Are there any user-facing changes? Same as #22493: plain-column wrong-order-files cases regain SortExec elimination when files happen to be non-overlapping by statistics. No new API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
