HippoBaro opened a new pull request, #9697:
URL: https://github.com/apache/arrow-rs/pull/9697
# Which issue does this PR close?
- Closes #9695 .
# Rationale for this change
The `PushDecoder` (introduced in #7997, #8080) is designed to decouple IO
and CPU. It holds non-contiguous byte ranges, with a `NeedsData`/`push_range`
protocol. However, it requires each logical read to be satisfied in full by a
single physical buffer: `has_range`, `get_bytes`, and `Read::read` all searched
for one buffer that entirely covered the requested range.
This assumption conflates two orthogonal IO strategies:
- Coalescing: the IO layer merges adjacent requested ranges into fewer,
larger fetches.
- Prefetching: the IO layer pushes data ahead of what the decoder has
requested. This is an inversion of control: the IO layer speculatively fills
buffers at offsets not yet requested and for arbitrary buffer sizes.
These two strategies interact poorly with the current release mechanism
(`clear_ranges`), which matches buffers by exact range equality:
- Coalescing is both rewarded and punished. It is load bearing because
without it, the number of physical buffers scale with ranges requested, and
`clear_ranges` performs an O(N×M) scan to remove consumed ranges, producing
quadratic overhead on wide schemas. But it is also punished because a coalesced
buffer never exactly matches any individual requested range, so `clear_ranges`
silently skips it: the buffer leaks in `PushBuffers` until the decoder finishes
or the caller manually calls `release_all_ranges` (#9624). This increases peak
RSS proportionally to the amount of data coalesced ahead of the current row
group.
- Prefetching is structurally impossible: speculatively pushed buffers will
straddle future read boundaries, so the decoder cannot consume them, and
`clear_ranges` cannot release them.
# What changes are included in this PR?
This commit makes `PushBuffers` boundary-agnostic, completing the
prefetching story, and changes the internals to scale with buffer count instead
of range count:
- Buffer stitching: `has_range`, `get_bytes`, and `Read::read` resolve
logical ranges across multiple contiguous physical buffers via binary search,
so the IO layer is free to push arbitrarily-sized parts without knowing future
read boundaries. This is a nice improvement, because some IO layer can be made
much more efficient when using uniform buffers and vectorized reads.
- Incremental release (`release_through`): replaces `clear_ranges` with a
watermark-based release that drops all buffers below a byte offset, trimming
straddling buffers via zero-copy `Bytes::slice`. The decoder calls this
automatically at row-group boundaries.
# Are these changes tested?
Significant coverage added, all tests passing. Benchmark results (vs
baseline in #9696):
```
push_decoder/1buf/1000ranges 321.9 µs (was 323.5 µs, −1%)
push_decoder/1buf/10000ranges 3.26 ms (was 3.25 ms, +0%)
push_decoder/1buf/100000ranges 34.9 ms (was 34.6 ms, +1%)
push_decoder/1buf/500000ranges 192.2 ms (was 185.3 ms, +4%)
push_decoder/Nbuf/1000ranges 363.9 µs (was 437.2 µs, −17%)
push_decoder/Nbuf/10000ranges 3.82 ms (was 10.7 ms, −64%)
push_decoder/Nbuf/100000ranges 42.1 ms (was 711.6 ms, −94%)
```
# Are there any user-facing changes?
`PushBuffers:: clear_all_ranges` marked as deprecated in favor of the newer
`PushBuffers::clear_all`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]