adriangb opened a new pull request, #20304: URL: https://github.com/apache/datafusion/pull/20304
## Summary - Sort files within each file group by min/max statistics during sort pushdown to better align with the requested ordering - When files are non-overlapping and within-file ordering is guaranteed (Parquet returns `Exact`), the `SortExec` is completely eliminated - When files overlap, best-effort statistics-based reordering is applied with `SortExec` retained for correctness (`Inexact`) - `ParquetSource::try_pushdown_sort` now returns `Exact` when the file's natural ordering already satisfies the request, enabling sort elimination for the same-direction case - Add SLT integration tests covering non-overlapping sort elimination, overlapping files, reverse scan with mixed file naming, and multi-group merging Related to https://github.com/apache/datafusion/issues/19724 ## Test plan - [x] `cargo test -p datafusion-sqllogictest --test sqllogictests -- sort_pushdown` passes - [x] `cargo test -p datafusion-sqllogictest --test sqllogictests -- parquet_sorted_statistics` passes - [x] New SLT tests verify EXPLAIN plans show correct optimizer behavior (sort elimination, SortExec retention, reverse_row_groups, file ordering) - [x] New SLT tests verify query result correctness for all scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
