alamb commented on issue #7363: URL: https://github.com/apache/arrow-rs/issues/7363#issuecomment-2786661682
> I am interested for this topic, if anything i can help the testing to compare the performance or code improvement as a follow-up? Thank you so much @zhuqi-lucas ! that is great news. I think the first thing we should do is 1. Run the existing [arrow_reader](https://github.com/apache/arrow-rs/blob/main/parquet/benches/arrow_reader.rs) benchmarks against https://github.com/apache/arrow-rs/pull/6921 and see if it shows any regressions 2. Add new benchmarks in arrow_reader (or maybe in a new arrow_reader_row_filter) that test reading parquet data with a row filter (aka [with_row_filter](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html#method.with_row_filter)). There should be benchmarks both with 1) a filter on a column that is also selected and 2) a filter on a column that is not also selected (aka `projection=a, filter=b > 1` or something) Does that make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org