Re: [PR] bench(parquet): add row filter strategy baseline cases [arrow-rs]

via GitHub Wed, 24 Jun 2026 01:48:31 -0700


hhhizzz commented on PR #10135:
URL: https://github.com/apache/arrow-rs/pull/10135#issuecomment-4787489300


   > Thank you @hhhizzz --- this is quite a comprehensive set of benchmarks -- 
however, I wonder if we really need all this coverage?
   > 
   > For example, there are 27 variants of `FilterType` -- I see there are some 
comments about some of the filter shapes matching particular TPCH / TPCDS 
predicate shapes, but I am trying ti figure out why a benchmark that compares 
all the different is going to be helpful in the long run to avoid regressions 
-- I worry taht this benchmark will generate so much data that we will find it 
hard to run / reason about
   > 
   > It seems like this benchmark may be most useful a development/tuning 
benchmark as it helps establish baseline timings for row-filter materialization 
policy choices: That is useful when designing future heuristics but it is hard 
to grok the results
   > 
   > Are there any important filter cases we should add to arrow_row_filter?
   
   I took another pass at narrowing the benchmark scope.
   
   The materialization-policy target is now explicitly focused on 
policy-boundary/tuning cases rather than being a broad row-filter regression 
matrix:
   
   - reduced the materialization-policy benchmark to one group: 
`arrow_reader_materialization_policy_async_focus`
   - pruned duplicate/weak cases, including the small scalar-prefix case
   - moved chained predicate-order coverage out of materialization and into 
`arrow_reader_row_filter_async_predicate_order_focus`
   - kept the materialization cases as 21 named reader-level shapes rather than 
a full Cartesian product
   - extracted the shared synthetic reader fixture so the row-filter and 
materialization benches use the same setup
   
   For `arrow_reader_row_filter`, the important missing case I added is chained 
`RowFilter` predicate ordering: fixed-width predicate before var-width 
predicate, and the reverse order. The existing `Composite` case evaluated both 
predicates inside one `ArrowPredicateFn`, so it did not exercise pruning 
between sequential predicates.
   
   If this still feels too broad, I can trim the materialization-policy target 
further, but the current split is intended to keep generic row-filter behavior 
in `arrow_reader_row_filter` and leave the separate materialization target for 
future `Auto` policy tuning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] bench(parquet): add row filter strategy baseline cases [arrow-rs]

Reply via email to