alamb opened a new pull request, #7470: URL: https://github.com/apache/arrow-rs/pull/7470
# Which issue does this PR close? - Closes https://github.com/apache/arrow-rs/issues/7460 - Part of https://github.com/apache/arrow-rs/issues/7456 # Rationale for this change We are trying to improve the performance of row filter application in the Parquet arrow reader and part of that is a benchmark that we can use to guide optimization efforts. However, as discussed in https://github.com/apache/arrow-rs/pull/7428 the `arrow_reader_row_filter` microbenchmark doesn't currently reflect the actual performance we see in our end to end application (DataFusion). ```shell cargo bench --all-features --bench arrow_reader_row_filter ``` Thus, we think we need to create a benchmark that uses the actual ClickBench dataset with appropriate filtering - See https://github.com/apache/arrow-rs/issues/7460 for more details # What changes are included in this PR? 1. Adds a new `arrow_reader_clickbench` benchmark This benchmark tests applying the actual clickbench filters (and column materialization): 1. Single file and partitioned (100 file) datasets 2. async and sync readers 2. All clickbench query patterns # Are there any user-facing changes? New benchmark, and hopefully thus improved filter / projection performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org