tustvold commented on PR #4376:
URL: https://github.com/apache/arrow-rs/pull/4376#issuecomment-1580839944
The benchmarks in #4378 show this to have a minor performance benefit,
likely due to not needing to buffer and split off definition levels and values
```
arrow_array_reader/ListArray/plain encoded optional strings no NULLs
time: [1.5840 ms 1.5868 ms 1.5903 ms]
change: [-8.9378% -8.6442% -8.3995%] (p = 0.00 <
0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low mild
4 (4.00%) high mild
7 (7.00%) high severe
Benchmarking arrow_array_reader/ListArray/plain encoded optional strings
half NULLs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 6.1s, enable flat sampling, or reduce sample count to 60.
arrow_array_reader/ListArray/plain encoded optional strings half NULLs
time: [1.2136 ms 1.2143 ms 1.2150 ms]
change: [-2.9329% -2.8874% -2.8359%] (p = 0.00 <
0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
```
Looking at the flamegraph of this PR, we can see that reading the repetition
levels is a relatively small portion of the runtime, at least compared to the
overheads associated with stripping empty lists and padding nulls, making this
even more impressive

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]