zhuqi-lucas opened a new pull request, #22501:
URL: https://github.com/apache/datafusion/pull/22501

   ## Which issue does this PR close?
   
   Cherry-pick of #22493 onto `branch-54`.
   
   ## Rationale for this change
   
   `branch-54` includes #21956 (`feat: globally reorder files and row groups by 
statistics for TopK queries`), which introduced a regression: for plain-column, 
multi-file scans where the on-disk file order does not match the declared sort 
order, `SortExec` was no longer eliminated even when stats-based reorder 
produced non-overlapping file groups whose declared ordering re-validated.
   
   #22493 restores the pre-#21956 sort-elimination behaviour by re-validating 
`output_ordering` after `rebuild_with_source` reorders files, and (per 
@adriangb's correctness follow-up) restoring the original hint-free 
`file_source` on the Inexact→Exact upgrade so leftover `reverse_row_groups` / 
`sort_order_for_reorder` hints don't mis-order row groups within a single file 
once the `SortExec` safety net is gone.
   
   ## What changes are included in this PR?
   
   Straight cherry-pick of merge commit `94c58d086`. Includes:
   
   - `FileScanConfig::try_pushdown_sort` Inexact arm: re-validate, upgrade to 
Exact (with file_source restore), guard with NULL safety + early-return
   - `rebuild_with_source`: `match (all_non_overlapping, is_exact)` decision 
table for keep_ordering
   - SLT updates restoring `SortExec` elimination expectations + Tests 5b/5c/8b 
for the NULL-safety and same-min row-group edge cases
   
   ## Are these changes tested?
   
   Cherry-picked cleanly (auto-merge in `sort_pushdown.rs`).
   `cargo build -p datafusion-datasource` — passes.
   `cargo test -p datafusion-sqllogictest --test sqllogictests -- 
sort_pushdown` — passes.
   
   ## Are there any user-facing changes?
   
   Same as #22493: plain-column wrong-order-files cases regain SortExec 
elimination when files happen to be non-overlapping by statistics. No new API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to