geoffreyclaude opened a new pull request, #23003:
URL: https://github.com/apache/datafusion/pull/23003

   ## Which issue does this PR close?
   
   This PR does not close an issue. It adds a first-class benchmark target for 
an existing `sort-tpch` mode.
   
   ## Rationale for this change
   
   `dfbench sort-tpch` already supports TopK queries over input declared as 
sorted with `--sorted --limit`, but `bench.sh` only exposed the unsorted TopK 
wrapper as `topk_tpch`.
   
   That made the sorted TopK path harder to run from the benchmark bot and 
easier to miss during performance work. Adding `topk_sorted_tpch` gives 
reviewers and contributors a named target for the sorted-input TopK case:
   
   ```bash
   ./benchmarks/bench.sh run topk_sorted_tpch
   ```
   
   The new target uses `--limit 100` so it is the sorted counterpart to the 
existing `topk_tpch` benchmark.
   
   ## What changes are included in this PR?
   
   - Adds `topk_sorted_tpch` to the benchmark script help text.
   - Reuses the existing TPC-H SF1 parquet data setup.
   - Adds a `run_topk_sorted_tpch` wrapper around `dfbench sort-tpch --sorted 
--limit 100`.
   - Writes results to `run_topk_sorted_tpch.json`.
   - Documents the new benchmark target in `benchmarks/README.md`.
   
   ## Are these changes tested?
   
   Validated with:
   
   ```bash
   bash -n benchmarks/bench.sh
   CARGO_COMMAND=echo DATA_DIR=/tmp/df-topk-bench-data 
RESULTS_NAME=topk_sorted_tpch_smoke ./benchmarks/bench.sh run topk_sorted_tpch
   git diff --check
   cargo fmt --all
   cargo clippy --all-targets --all-features -- -D warnings
   ```
   
   The smoke run verified that the script dispatches to:
   
   ```bash
   dfbench sort-tpch --iterations 5 --path ... --sorted --limit 100
   ```
   
   ## Are there any user-facing changes?
   
   No engine or API behavior changes. This only adds a new opt-in benchmark 
target.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to