hhhizzz opened a new pull request, #10135:
URL: https://github.com/apache/arrow-rs/pull/10135
# Which issue does this PR close?
- Part of #8846.
- Part of #7456.
- Split out from #9956.
# Rationale for this change
This PR is the first smaller PR split out from #9956 ("Optimize parquet row
filter auto strategy with adaptive fallback").
The goal is to land the benchmark coverage first, before changing row-filter
planning or execution behavior. This gives follow-up PRs a stable benchmark
baseline already on `main`, making it easier to compare each later behavior
change against the same benchmark cases.
Planned split from #9956:
1. Add benchmark baseline cases. This PR.
2. Split row-selection strategy / sparse mask correctness changes.
3. Add post-filter execution primitives.
4. Add Auto policy / adaptive materialization core.
5. Add policy refinements for projected predicates, fixed-prefix guards, and
cacheable predicate cases.
# What changes are included in this PR?
This PR adds benchmark coverage only. The diff is limited to benchmark
targets under `parquet/benches`, with no changes to production reader code or
public APIs.
It extends `arrow_reader_row_filter` with:
- strategy comparison cases for:
- manual full-scan post-filtering;
- current `RowSelectionPolicy::Auto`;
- explicit `Selectors`;
- explicit `Mask`;
- focused row-filter shapes inspired by ClickBench and TPC-DS workloads;
- projected-predicate cases;
- count-only / filter-only / fixed-width / variable-width projection cases;
- nested whole-root output benchmark coverage;
- projected scan focus cases that do not construct a `RowFilter`.
It also extends `row_selection_cursor` with shape-focused selector/mask
cases that vary:
- selected-run length;
- selectivity;
- primitive vs variable-width payloads.
This PR intentionally does not change production reader behavior.
# Are these changes tested?
Yes. This PR was validated with:
```bash
cargo fmt -- parquet/benches/arrow_reader_row_filter.rs
parquet/benches/row_selection_cursor.rs
cargo check -p parquet --bench row_selection_cursor --features arrow
cargo check -p parquet --bench arrow_reader_row_filter --features arrow,async
git diff --check
```
No benchmark result is claimed in this PR. The purpose is to add baseline
benchmark coverage so later PRs can report comparable performance evidence.
# Are there any user-facing changes?
No. This only changes benchmark code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]