zhuqi-lucas opened a new issue, #21582:
URL: https://github.com/apache/datafusion/issues/21582
## Is your feature request related to a problem or challenge?
The existing `sort_pushdown_sorted` benchmark covers the **Exact** path
(sort elimination, scan limit). However, the Inexact path optimizations —
reverse scan (#19064) and row group reorder by statistics (#21580) — are not
benchmarked.
Without an Inexact benchmark, we can't:
- Measure the performance impact of row group reorder
- Validate future improvements (dynamic RG pruning #21399, global reorder
per @Dandandan's suggestion in #21580)
- Detect regressions in the Inexact code path
## Describe the solution you'd like
Extend `benchmarks/bench.sh` and queries under
`benchmarks/queries/sort_pushdown/` to add Inexact scenarios:
1. **Data**: Generate a single large file with multiple row groups where row
groups have **overlapping or out-of-order statistics** (forces Inexact path).
Can be done by:
- Writing data in non-sorted order with small `max_row_group_size`
- Creating synthetic data with controlled row group boundaries
2. **Queries** (`benchmarks/queries/sort_pushdown/q5.sql`, `q6.sql`, ...):
- `SELECT * FROM t ORDER BY col ASC LIMIT 10` — TopK + RG reorder
- `SELECT * FROM t ORDER BY col DESC LIMIT 10` — TopK + reverse scan + RG
reorder
- `SELECT * FROM t ORDER BY col ASC LIMIT 1000` — larger LIMIT
- Wide-row variant: `SELECT *` with many columns to show row-level filter
benefit
3. **Baseline comparison**: With/without
`datafusion.optimizer.enable_sort_pushdown` to isolate the optimization's
impact.
## Additional context
- Parent: #17348 (sort pushdown optimization)
- #19064 — TopK with reverse row group scan
- #21580 — Row group reorder by statistics (needs this benchmark to show
impact)
- #21399 — Dynamic row group pruning (would also benefit from this benchmark)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]