Dandandan opened a new pull request, #22617: URL: https://github.com/apache/datafusion/pull/22617
## Which issue does this PR close? None yet. ## Rationale for this change Large `ORDER BY ... LIMIT` queries can sort only key columns first and materialize non-key columns after the TopK rows are known. This reduces the amount of data read and moved through TopK execution, while preserving result order through row-number based selection. ## What changes are included in this PR? - Adds a `LateMaterialization` physical optimizer rule enabled by default via `datafusion.optimizer.enable_row_number_topk_late_materialization` - Adds `LateTopKMaterializationExec`, including a generic fallback path and a Parquet/file row-selection pushdown path - Adds `FileRowsSelection` support to file scan extensions and Parquet access planning - Documents the new optimizer config and updates the optimizer rule reference - Adds optimizer and Parquet integration coverage ## Are these changes tested? Yes. - `cargo fmt --all` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo check -p datafusion-datasource-parquet -p datafusion-physical-optimizer -p datafusion` - `cargo clippy -p datafusion-physical-optimizer -p datafusion-datasource-parquet -- -D warnings` - `cargo test -p datafusion --test core_integration late_materialization` - `cargo test -p datafusion --test parquet_integration file_rows_selection` - `git diff --check` ## Are there any user-facing changes? Yes. The new optimizer rule is enabled by default and can be disabled with `datafusion.optimizer.enable_row_number_topk_late_materialization = false`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
