hhhizzz commented on PR #10135:
URL: https://github.com/apache/arrow-rs/pull/10135#issuecomment-4830289281
@alamb Thanks for the feedback. I pushed an update that narrows this PR
toward a smaller, maintainable benchmark baseline rather than a broad
policy-tuning sweep.
Main changes in this revision:
- Kept the shared `arrow_reader_common` fixture so the synthetic parquet
data setup is not duplicated across reader benchmarks.
- Reduced `arrow_reader_row_filter` so it remains a reader regression
baseline:
- removed the sync strategy matrix
- kept only a small async strategy matrix with representative fixed-width
and `Utf8View` filters
- reduced the nested-output focus case to `full_post_filter` vs `Auto`
- Reduced `arrow_reader_materialization_policy` to 10 representative cases,
each still comparing:
- full post-filtering
- `Auto`
- forced `Mask`
- forced `Selectors`
The intent is that `arrow_reader_row_filter` covers general
reader/filter/projection regressions, while
`arrow_reader_materialization_policy` keeps just enough focused coverage to
detect whether `Auto` is choosing a sensible fallback path for cases like high
selectivity, projected predicate columns, count-only output, and variable-width
deferred output.
I also measured the trimmed default Criterion runtime.(Tested a 24 core
AMD64 linux machine)
| target | benchmark ids | elapsed |
|---|---:|---:|
| `arrow_reader_row_filter` | 74 | `12:42.39` |
| `arrow_reader_materialization_policy` | 40 | `6:38.09` |
| combined | 114 | `19:20.48` |
For comparison, before the reduction these two targets took about `35:32.88`
combined. So this keeps the fallback/policy signal while bringing the default
runtime down substantially.
Validation:
```bash
cargo bench -p parquet --features arrow,async --no-run --bench
arrow_reader_row_filter --bench arrow_reader_materialization_policy
```
One note: I left `row_selection_cursor` as a separate target because it
exercises the lower-level selector-vs-mask shape boundary. If you would prefer
this PR to focus only on the reader/materialization benchmarks, I can split
that target into a follow-up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]