Re: [PR] bench(parquet): add row filter strategy baseline cases [arrow-rs]

via GitHub Mon, 29 Jun 2026 01:08:31 -0700


hhhizzz commented on PR #10135:
URL: https://github.com/apache/arrow-rs/pull/10135#issuecomment-4830289281


   @alamb  Thanks for the feedback. I pushed an update that narrows this PR 
toward a smaller, maintainable benchmark baseline rather than a broad 
policy-tuning sweep.
   
   Main changes in this revision:
   
   - Kept the shared `arrow_reader_common` fixture so the synthetic parquet 
data setup is not duplicated across reader benchmarks.
   - Reduced `arrow_reader_row_filter` so it remains a reader regression 
baseline:
     - removed the sync strategy matrix
     - kept only a small async strategy matrix with representative fixed-width 
and `Utf8View` filters
     - reduced the nested-output focus case to `full_post_filter` vs `Auto`
   - Reduced `arrow_reader_materialization_policy` to 10 representative cases, 
each still comparing:
     - full post-filtering
     - `Auto`
     - forced `Mask`
     - forced `Selectors`
   
   The intent is that `arrow_reader_row_filter` covers general 
reader/filter/projection regressions, while 
`arrow_reader_materialization_policy` keeps just enough focused coverage to 
detect whether `Auto` is choosing a sensible fallback path for cases like high 
selectivity, projected predicate columns, count-only output, and variable-width 
deferred output.
   
   I also measured the trimmed default Criterion runtime.(Tested a 24 core 
AMD64 linux machine)
   
   | target | benchmark ids | elapsed |
   |---|---:|---:|
   | `arrow_reader_row_filter` | 74 | `12:42.39` |
   | `arrow_reader_materialization_policy` | 40 | `6:38.09` |
   | combined | 114 | `19:20.48` |
   
   For comparison, before the reduction these two targets took about `35:32.88` 
combined. So this keeps the fallback/policy signal while bringing the default 
runtime down substantially.
   
   Validation:
   
   ```bash
   cargo bench -p parquet --features arrow,async --no-run --bench 
arrow_reader_row_filter --bench arrow_reader_materialization_policy
   ```
   
   One note: I left `row_selection_cursor` as a separate target because it 
exercises the lower-level selector-vs-mask shape boundary. If you would prefer 
this PR to focus only on the reader/materialization benchmarks, I can split 
that target into a follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] bench(parquet): add row filter strategy baseline cases [arrow-rs]

Reply via email to