hhhizzz commented on PR #10135: URL: https://github.com/apache/arrow-rs/pull/10135#issuecomment-4787489300
> Thank you @hhhizzz --- this is quite a comprehensive set of benchmarks -- however, I wonder if we really need all this coverage? > > For example, there are 27 variants of `FilterType` -- I see there are some comments about some of the filter shapes matching particular TPCH / TPCDS predicate shapes, but I am trying ti figure out why a benchmark that compares all the different is going to be helpful in the long run to avoid regressions -- I worry taht this benchmark will generate so much data that we will find it hard to run / reason about > > It seems like this benchmark may be most useful a development/tuning benchmark as it helps establish baseline timings for row-filter materialization policy choices: That is useful when designing future heuristics but it is hard to grok the results > > Are there any important filter cases we should add to arrow_row_filter? I took another pass at narrowing the benchmark scope. The materialization-policy target is now explicitly focused on policy-boundary/tuning cases rather than being a broad row-filter regression matrix: - reduced the materialization-policy benchmark to one group: `arrow_reader_materialization_policy_async_focus` - pruned duplicate/weak cases, including the small scalar-prefix case - moved chained predicate-order coverage out of materialization and into `arrow_reader_row_filter_async_predicate_order_focus` - kept the materialization cases as 21 named reader-level shapes rather than a full Cartesian product - extracted the shared synthetic reader fixture so the row-filter and materialization benches use the same setup For `arrow_reader_row_filter`, the important missing case I added is chained `RowFilter` predicate ordering: fixed-width predicate before var-width predicate, and the reverse order. The existing `Composite` case evaluated both predicates inside one `ArrowPredicateFn`, so it did not exercise pruning between sequential predicates. If this still feels too broad, I can trim the materialization-policy target further, but the current split is intended to keep generic row-filter behavior in `arrow_reader_row_filter` and leave the separate materialization target for future `Auto` policy tuning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
