[I] arrow_reader_row_filter benchmark doesn't capture page cache improvements [arrow-rs]

via GitHub Wed, 30 Apr 2025 05:04:56 -0700


alamb opened a new issue, #7460:
URL: https://github.com/apache/arrow-rs/issues/7460


   - Part of https://github.com/apache/arrow-rs/issues/7456
   
   We are trying to improve the performance of row filter application and part 
of that is a benchmark that we can use to guide optimization efforts. 
   
   ```shell
   cargo bench --all-features --bench arrow_reader_row_filter
   ```
   
   However, as shown in https://github.com/apache/arrow-rs/pull/7428  we have a 
case where we see the performance benefit when running an end to end query in 
datafusion but the same improvement is not seen in the benchmark.
   
   This ticket tracks figuring out why the benchmark doesn't show an 
improvement even when the end to end query does.
   
   
   > 
   > Interesting, the decoder cache doesn't seem to help much on my test 
machine (which is some crappy gcp VM). I couldn't reproduce the results listed 
on [#7363 
(comment)](https://github.com/apache/arrow-rs/issues/7363#issuecomment-2816670842)
 🤔
   
   > Thank you @alamb ,  it seems no obvious improvement compares to main. This 
branch only improve PointLookup for 1000000 line big data set comparing to 
original better-decode. 
   
   > I agree, we need to find how to mock clickbench result from arrow-rs side.
   
   _Originally posted by @zhuqi-lucas in 
https://github.com/apache/arrow-rs/issues/7428#issuecomment-2838960224_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] arrow_reader_row_filter benchmark doesn't capture page cache improvements [arrow-rs]

Reply via email to