Re: [PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

via GitHub Wed, 30 Apr 2025 22:47:34 -0700


zhuqi-lucas commented on PR #7461:
URL: https://github.com/apache/arrow-rs/pull/7461#issuecomment-2844151896


   > Unfortunately, even after adjusting the benchmark on this branch I still 
don't see major changes in #7428.
   > 
   > I will look more deeply tomorrow
   > 
   > ```shell
   > cargo bench --all-features --bench arrow_reader_row_filter -- 
Utf8ViewNonEmpty
   > ```
   > 
   > Main compared to #7428
   > 
   > ```
   > arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async
   >                         time:   [4.1253 ms 4.1553 ms 4.1881 ms]
   >                         change: [-4.8097% -3.9190% -3.0939%] (p = 0.00 < 
0.05)
   >                         Performance has improved.
   > Found 19 outliers among 100 measurements (19.00%)
   >   14 (14.00%) high mild
   >   5 (5.00%) high severe
   > arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/sync
   >                         time:   [4.2269 ms 4.2340 ms 4.2419 ms]
   >                         change: [-1.5246% -1.1130% -0.7616%] (p = 0.00 < 
0.05)
   >                         Change within noise threshold.
   > Found 5 outliers among 100 measurements (5.00%)
   >   3 (3.00%) high mild
   >   2 (2.00%) high severe
   > arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/async
   >                         time:   [3.0754 ms 3.0802 ms 3.0857 ms]
   >                         change: [-3.2754% -2.7568% -2.2574%] (p = 0.00 < 
0.05)
   >                         Performance has improved.
   > Found 11 outliers among 100 measurements (11.00%)
   >   6 (6.00%) high mild
   >   5 (5.00%) high severe
   > arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/sync
   >                         time:   [3.0774 ms 3.0839 ms 3.0909 ms]
   >                         change: [-1.4528% -1.1133% -0.7921%] (p = 0.00 < 
0.05)
   >                         Change within noise threshold.
   > Found 7 outliers among 100 measurements (7.00%)
   >   6 (6.00%) high mild
   >   1 (1.00%) high severe
   > ```
   
   Thank you @alamb for this work, so we still need to investigate more. Is it 
possible that we can get a 10% data set from hit.parquet and do some benchmark 
from arrow-rs side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Update arrow_reader_row_filter benchmark to reflect ClickBench distribution [arrow-rs]

Reply via email to