[GitHub] [arrow-rs] tustvold commented on pull request #4376: Move record delimiting into ColumnReader (#4365)

via GitHub Wed, 07 Jun 2023 06:36:34 -0700


tustvold commented on PR #4376:
URL: https://github.com/apache/arrow-rs/pull/4376#issuecomment-1580839944


   The benchmarks in #4378 show this to have a minor performance benefit, 
likely due to not needing to buffer and split off definition levels and values
   
   ```
   arrow_array_reader/ListArray/plain encoded optional strings no NULLs
                           time:   [1.5840 ms 1.5868 ms 1.5903 ms]
                           change: [-8.9378% -8.6442% -8.3995%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     7 (7.00%) high severe
   Benchmarking arrow_array_reader/ListArray/plain encoded optional strings 
half NULLs: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 6.1s, enable flat sampling, or reduce sample count to 60.
   arrow_array_reader/ListArray/plain encoded optional strings half NULLs
                           time:   [1.2136 ms 1.2143 ms 1.2150 ms]
                           change: [-2.9329% -2.8874% -2.8359%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   ```
   
   Looking at the flamegraph of this PR, we can see that reading the repetition 
levels is a relatively small portion of the runtime, at least compared to the 
overheads associated with stripping empty lists and padding nulls, making this 
even more impressive
   
   
![image](https://github.com/apache/arrow-rs/assets/1781103/8618b72d-9055-43fa-94bb-fd3eec62bced)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on pull request #4376: Move record delimiting into ColumnReader (#4365)

Reply via email to