JakeDern commented on PR #10128:
URL: https://github.com/apache/arrow-rs/pull/10128#issuecomment-4685992738
Pretty good improvement - ~42% for the dictionary case and ~20% for delta
dictionary cases. Not 100% sure why less improvement on the delta side yet, but
I think this is worth it to take on its own and can investigate further later.
Perf results from #10122:
```
➜ arrow-ipc git:(ipc-writer-dict-benches) cargo bench
(StreamWriter|FileWriter)/write_10 --features zstd
zsh: no matches found: (StreamWriter|FileWriter)/write_10
➜ arrow-ipc git:(ipc-writer-dict-benches) cargo bench
"(StreamWriter|FileWriter)/write_10" --features zstd
Finished `bench` profile [optimized] target(s) in 0.07s
Running benches/ipc_reader.rs
(/home/jakedern/repos/arrow-rs/target/release/deps/ipc_reader-a1b491f58c77bb6a)
Running benches/ipc_writer.rs
(/home/jakedern/repos/arrow-rs/target/release/deps/ipc_writer-6612be2d7eba35b1)
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10: Collecting 100
samples in estimated 5.5019 s (50k
iteratiarrow_ipc_stream_writer/StreamWriter/write_10
time: [107.53 µs 108.06 µs 108.61 µs]
change: [−2.9828% −0.9112% +0.7341%] (p = 0.39 >
0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/zstd: Collecting
100 samples in estimated 5.0248 s (1100
iarrow_ipc_stream_writer/StreamWriter/write_10/zstd
time: [4.5765 ms 4.6054 ms 4.6355 ms]
change: [−0.7831% +0.1488% +1.0639%] (p = 0.75 >
0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10: Collecting 100
samples in estimated 5.3861 s (50k
iterationarrow_ipc_stream_writer/FileWriter/write_10
time: [106.14 µs 106.82 µs 107.54 µs]
change: [+1.1887% +2.7126% +4.6164%] (p = 0.00 <
0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict: Collecting
100 samples in estimated 5.2009 s (71k
itarrow_ipc_stream_writer/StreamWriter/write_10/dict
time: [60.775 µs 62.004 µs 63.440 µs]
change: [−6.4822% −3.5063% −0.6010%] (p = 0.03 <
0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta:
Collecting 100 samples in estimated 5.0870 s
(arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta
time: [128.47 µs 129.73 µs 130.88 µs]
change: [−1.8693% −0.0642% +1.7216%] (p = 0.95 >
0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10/dict/delta:
Collecting 100 samples in estimated 5.5440 s
(45arrow_ipc_stream_writer/FileWriter/write_10/dict/delta
time: [130.29 µs 131.33 µs 132.26 µs]
change: [+1.8877% +2.8406% +3.8001%] (p = 0.00 <
0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
3 (3.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
➜ arrow-ipc git:(ipc-writer-dict-benches)
```
perf results from this branch:
```
➜ arrow-ipc git:(ipc-writer-collect-dicts) ✗ cargo bench
"(StreamWriter|FileWriter)/write_10" --features zstd
Compiling arrow-ipc v59.0.0 (/home/jakedern/repos/arrow-rs/arrow-ipc)
Finished `bench` profile [optimized] target(s) in 2.55s
Running benches/ipc_reader.rs
(/home/jakedern/repos/arrow-rs/target/release/deps/ipc_reader-a1b491f58c77bb6a)
Running benches/ipc_writer.rs
(/home/jakedern/repos/arrow-rs/target/release/deps/ipc_writer-6612be2d7eba35b1)
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10: Collecting 100
samples in estimated 5.3935 s (50k
iteratiarrow_ipc_stream_writer/StreamWriter/write_10
time: [106.95 µs 107.85 µs 108.76 µs]
change: [−2.2269% −1.1032% −0.0394%] (p = 0.06 >
0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/zstd: Collecting
100 samples in estimated 5.0249 s (1100
iarrow_ipc_stream_writer/StreamWriter/write_10/zstd
time: [4.5629 ms 4.5901 ms 4.6184 ms]
change: [−1.1939% −0.3327% +0.5704%] (p = 0.47 >
0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10: Collecting 100
samples in estimated 5.0247 s (45k
iterationarrow_ipc_stream_writer/FileWriter/write_10
time: [109.86 µs 110.45 µs 111.11 µs]
change: [+0.3417% +2.0979% +3.7750%] (p = 0.01 <
0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict: Collecting
100 samples in estimated 5.0650 s (136k
iarrow_ipc_stream_writer/StreamWriter/write_10/dict
time: [37.300 µs 37.543 µs 37.807 µs]
change: [−43.963% −42.283% −40.548%] (p = 0.00 <
0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
Benchmarking arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta:
Collecting 100 samples in estimated 5.4418 s
(arrow_ipc_stream_writer/StreamWriter/write_10/dict/delta
time: [103.36 µs 104.28 µs 105.18 µs]
change: [−19.764% −18.621% −17.508%] (p = 0.00 <
0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking arrow_ipc_stream_writer/FileWriter/write_10/dict/delta:
Collecting 100 samples in estimated 5.4730 s
(56arrow_ipc_stream_writer/FileWriter/write_10/dict/delta
time: [104.84 µs 105.65 µs 106.44 µs]
change: [−20.651% −20.021% −19.377%] (p = 0.00 <
0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]