Dandandan opened a new pull request, #21588: URL: https://github.com/apache/datafusion/pull/21588
## Which issue does this PR close? Extends #21580 (row group reorder by statistics during sort pushdown) to also reorder by GROUP BY keys, as suggested in the [review](https://github.com/apache/datafusion/pull/21580#discussion_r2191783519). ## Rationale for this change When an `AggregateExec` sits above a Parquet data source, reading row groups in grouping-key order clusters similar group values together. This: 1. **Reduces active cardinality** of aggregation hash tables — fewer live entries at any time 2. **Improves CPU cache locality** for hash table lookups during aggregation ## What changes are included in this PR? - Add `try_pushdown_groupby_order()` to `ExecutionPlan`, `DataSource`, and `FileSource` traits (default: no-op) - Implement on `DataSourceExec` → `FileScanConfig` → `ParquetSource` chain, reusing the existing `sort_order_for_reorder` / `reorder_by_statistics` infrastructure from #21580 - New `ReorderByGroupKeys` physical optimizer rule that detects `AggregateExec` → `DataSourceExec` patterns and pushes grouping key expressions down - Rule runs before `PushdownSort` so sort pushdown can override the reorder hint when a sort requirement is present - SLT tests for GROUP BY correctness with the optimization active ## Are these changes tested? Yes — SLT tests in `sort_pushdown.slt` (Test I) verify correct GROUP BY SUM and COUNT results with the optimization active on multi-row-group Parquet files. ## Are there any user-facing changes? No user-facing API changes. The optimization is automatic when aggregating over Parquet data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
