Re: [I] Parquet decoder / decoded page Cache [arrow-rs]

via GitHub Tue, 08 Apr 2025 07:44:06 -0700


alamb commented on issue #7363:
URL: https://github.com/apache/arrow-rs/issues/7363#issuecomment-2786661682


   > I am interested for this topic, if anything i can help the testing to 
compare the performance or code improvement as a follow-up?
   
   Thank you so much @zhuqi-lucas ! that is great news.
   
   I think the first thing we should do is
   1. Run the existing 
[arrow_reader](https://github.com/apache/arrow-rs/blob/main/parquet/benches/arrow_reader.rs)
 benchmarks against https://github.com/apache/arrow-rs/pull/6921 and see if it 
shows any regressions
   2. Add new benchmarks in arrow_reader (or maybe in a new 
arrow_reader_row_filter) that test reading parquet data with a row filter (aka 
[with_row_filter](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html#method.with_row_filter)).
 There should be benchmarks both with 1) a filter on a column that is also 
selected and 2) a filter on a column that is not also selected (aka 
`projection=a, filter=b > 1` or something)
   
   Does that make sense?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Parquet decoder / decoded page Cache [arrow-rs]

Reply via email to