alamb commented on PR #7513:
URL: https://github.com/apache/arrow-rs/pull/7513#issuecomment-2886990125

   I tested this branch using a query that filters and selects the same column 
(NOTE it is critical to *NOT* use `--all-features` as all features turns on 
force_validate
   
   ```shell
   cargo bench --features="arrow async" --bench arrow_reader_clickbench -- Q24
   ```
   
   Here are the benchmark results (30ms --> 22ms)  (25 % faster)
   
   ```
   Gnuplot not found, using plotters backend
   Looking for ClickBench files starting in current_dir and all parent 
directories: "/Users/andrewlamb/Software/arrow-rs/parquet"
   arrow_reader_clickbench/sync/Q24
                           time:   [22.532 ms 22.604 ms 22.682 ms]
                           change: [-27.751% -27.245% -26.791%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   
   arrow_reader_clickbench/async/Q24
                           time:   [24.043 ms 24.171 ms 24.308 ms]
                           change: [-26.223% -25.697% -25.172%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   ```
   
   I realize this branch currently uses more memory (to buffer the filter 
results), but I think the additional memory growth can be limited with a 
setting. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to