HippoBaro opened a new pull request, #9696:
URL: https://github.com/apache/arrow-rs/pull/9696
# Which issue does this PR close?
- None but relates to #9695.
# Rationale for this change
This PR is meant to document and measure the quadratic behavior reported in
the above issue.
# What changes are included in this PR?
# Are these changes tested?
Add a benchmark that isolates `PushBuffers` overhead during row group
construction, independent of page decoding. It calls `try_next_reader` to build
row group readers without consuming any pages, so the measured cost is purely
buffer lookup, stitching, and release.
Two benchmark groups exercise different scaling axes:
- `1buf`: pushes the entire file as a single buffer, varying column count
(100 to 50k). This isolates the per-range cost of `has_range`/`get_bytes`
lookups and `release_through`.
- `Nbuf`: pushes one buffer per requested range, varying column count (100
to 10k). This isolates the cost when buffer count equals range count.
Baseline results (Apple M1 Max):
```
push_decoder/1buf/1000ranges 323.5 µs
push_decoder/1buf/10000ranges 3.25 ms
push_decoder/1buf/100000ranges 34.6 ms
push_decoder/1buf/500000ranges 185.3 ms
push_decoder/Nbuf/1000ranges 437.2 µs
push_decoder/Nbuf/10000ranges 10.7 ms
push_decoder/Nbuf/100000ranges 711.6 ms
```
# Are there any user-facing changes?
N/A
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]