HippoBaro opened a new pull request, #9697:
URL: https://github.com/apache/arrow-rs/pull/9697

   # Which issue does this PR close?
   
   - Closes #9695 .
   
   # Rationale for this change
   
   The `PushDecoder` (introduced in #7997, #8080) is designed to decouple IO 
and CPU. It holds non-contiguous byte ranges, with a `NeedsData`/`push_range` 
protocol. However, it requires each logical read to be satisfied in full by a 
single physical buffer: `has_range`, `get_bytes`, and `Read::read` all searched 
for one buffer that entirely covered the requested range.
   
   This assumption conflates two orthogonal IO strategies:
   
   - Coalescing: the IO layer merges adjacent requested ranges into fewer, 
larger fetches.
   - Prefetching: the IO layer pushes data ahead of what the decoder has 
requested. This is an inversion of control: the IO layer speculatively fills 
buffers at offsets not yet requested and for arbitrary buffer sizes.
   
   These two strategies interact poorly with the current release mechanism 
(`clear_ranges`), which matches buffers by exact range equality:
   
   - Coalescing is both rewarded and punished. It is load bearing because 
without it, the number of physical buffers scale with ranges requested, and 
`clear_ranges` performs an O(N×M) scan to remove consumed ranges, producing 
quadratic overhead on wide schemas. But it is also punished because a coalesced 
buffer never exactly matches any individual requested range, so `clear_ranges` 
silently skips it: the buffer leaks in `PushBuffers` until the decoder finishes 
or the caller manually calls `release_all_ranges` (#9624). This increases peak 
RSS proportionally to the amount of data coalesced ahead of the current row 
group.
   
   - Prefetching is structurally impossible: speculatively pushed buffers will 
straddle future read boundaries, so the decoder cannot consume them, and 
`clear_ranges` cannot release them.
   
   # What changes are included in this PR?
   
   This commit makes `PushBuffers` boundary-agnostic, completing the 
prefetching story, and changes the internals to scale with buffer count instead 
of range count:
   
   - Buffer stitching: `has_range`, `get_bytes`, and `Read::read` resolve 
logical ranges across multiple contiguous physical buffers via binary search, 
so the IO layer is free to push arbitrarily-sized parts without knowing future 
read boundaries. This is a nice improvement, because some IO layer can be made 
much more efficient when using uniform buffers and vectorized reads.
   
   - Incremental release (`release_through`): replaces `clear_ranges` with a 
watermark-based release that drops all buffers below a byte offset, trimming 
straddling buffers via zero-copy `Bytes::slice`. The decoder calls this 
automatically at row-group boundaries.
   
   # Are these changes tested?
   
   Significant coverage added, all tests passing.  Benchmark results (vs 
baseline in #9696):
   
   ```
     push_decoder/1buf/1000ranges       321.9 µs   (was 323.5 µs,  −1%)
     push_decoder/1buf/10000ranges       3.26 ms   (was  3.25 ms,  +0%)
     push_decoder/1buf/100000ranges      34.9 ms   (was  34.6 ms,  +1%)
     push_decoder/1buf/500000ranges     192.2 ms   (was 185.3 ms,  +4%)
     push_decoder/Nbuf/1000ranges       363.9 µs   (was 437.2 µs, −17%)
     push_decoder/Nbuf/10000ranges       3.82 ms   (was  10.7 ms, −64%)
     push_decoder/Nbuf/100000ranges      42.1 ms   (was 711.6 ms, −94%)
   ```
   
   # Are there any user-facing changes?
   
   `PushBuffers:: clear_all_ranges` marked as deprecated in favor of the newer 
`PushBuffers::clear_all`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to