alamb commented on PR #7513: URL: https://github.com/apache/arrow-rs/pull/7513#issuecomment-2886990125
I tested this branch using a query that filters and selects the same column (NOTE it is critical to *NOT* use `--all-features` as all features turns on force_validate ```shell cargo bench --features="arrow async" --bench arrow_reader_clickbench -- Q24 ``` Here are the benchmark results (30ms --> 22ms) (25 % faster) ``` Gnuplot not found, using plotters backend Looking for ClickBench files starting in current_dir and all parent directories: "/Users/andrewlamb/Software/arrow-rs/parquet" arrow_reader_clickbench/sync/Q24 time: [22.532 ms 22.604 ms 22.682 ms] change: [-27.751% -27.245% -26.791%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe arrow_reader_clickbench/async/Q24 time: [24.043 ms 24.171 ms 24.308 ms] change: [-26.223% -25.697% -25.172%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe ``` I realize this branch currently uses more memory (to buffer the filter results), but I think the additional memory growth can be limited with a setting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org