[PR] bench(parquet): add row filter strategy baseline cases [arrow-rs]

via GitHub Fri, 12 Jun 2026 02:59:52 -0700


hhhizzz opened a new pull request, #10135:
URL: https://github.com/apache/arrow-rs/pull/10135


   # Which issue does this PR close?
   
   - Part of #8846.
   - Part of #7456.
   - Split out from #9956.
   
   # Rationale for this change
   
   This PR is the first smaller PR split out from #9956 ("Optimize parquet row 
filter auto strategy with adaptive fallback").
   
   The goal is to land the benchmark coverage first, before changing row-filter 
planning or execution behavior. This gives follow-up PRs a stable benchmark 
baseline already on `main`, making it easier to compare each later behavior 
change against the same benchmark cases.
   
   Planned split from #9956:
   
   1. Add benchmark baseline cases. This PR.
   2. Split row-selection strategy / sparse mask correctness changes.
   3. Add post-filter execution primitives.
   4. Add Auto policy / adaptive materialization core.
   5. Add policy refinements for projected predicates, fixed-prefix guards, and 
cacheable predicate cases.
   
   # What changes are included in this PR?
   
   This PR adds benchmark coverage only. The diff is limited to benchmark 
targets under `parquet/benches`, with no changes to production reader code or 
public APIs.
   
   It extends `arrow_reader_row_filter` with:
   
   - strategy comparison cases for:
     - manual full-scan post-filtering;
     - current `RowSelectionPolicy::Auto`;
     - explicit `Selectors`;
     - explicit `Mask`;
   - focused row-filter shapes inspired by ClickBench and TPC-DS workloads;
   - projected-predicate cases;
   - count-only / filter-only / fixed-width / variable-width projection cases;
   - nested whole-root output benchmark coverage;
   - projected scan focus cases that do not construct a `RowFilter`.
   
   It also extends `row_selection_cursor` with shape-focused selector/mask 
cases that vary:
   
   - selected-run length;
   - selectivity;
   - primitive vs variable-width payloads.
   
   This PR intentionally does not change production reader behavior.
   
   # Are these changes tested?
   
   Yes. This PR was validated with:
   
   ```bash
   cargo fmt -- parquet/benches/arrow_reader_row_filter.rs 
parquet/benches/row_selection_cursor.rs
   cargo check -p parquet --bench row_selection_cursor --features arrow
   cargo check -p parquet --bench arrow_reader_row_filter --features arrow,async
   git diff --check
   ```
   
   No benchmark result is claimed in this PR. The purpose is to add baseline 
benchmark coverage so later PRs can report comparable performance evidence.
   
   # Are there any user-facing changes?
   
   No. This only changes benchmark code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] bench(parquet): add row filter strategy baseline cases [arrow-rs]

Reply via email to