HippoBaro opened a new pull request, #9696:
URL: https://github.com/apache/arrow-rs/pull/9696

   # Which issue does this PR close?
   
   - None but relates to #9695.
   
   # Rationale for this change
   
   This PR is meant to document and measure the quadratic behavior reported in 
the above issue.
   
   # What changes are included in this PR?
   
   # Are these changes tested?
   
   Add a benchmark that isolates `PushBuffers` overhead during row group 
construction, independent of page decoding. It calls `try_next_reader` to build 
row group readers without consuming any pages, so the measured cost is purely 
buffer lookup, stitching, and release.
   
   Two benchmark groups exercise different scaling axes:
   
   - `1buf`: pushes the entire file as a single buffer, varying column count 
(100 to 50k). This isolates the per-range cost of `has_range`/`get_bytes` 
lookups and `release_through`.
   
   - `Nbuf`: pushes one buffer per requested range, varying column count (100 
to 10k). This isolates the cost when buffer count equals range count.
   
   Baseline results (Apple M1 Max):
   
   ```
     push_decoder/1buf/1000ranges       323.5 µs
     push_decoder/1buf/10000ranges       3.25 ms
     push_decoder/1buf/100000ranges      34.6 ms
     push_decoder/1buf/500000ranges     185.3 ms
     push_decoder/Nbuf/1000ranges       437.2 µs
     push_decoder/Nbuf/10000ranges       10.7 ms
     push_decoder/Nbuf/100000ranges     711.6 ms
   ```
   # Are there any user-facing changes?
   
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to