Re: [PR] fix(arrow-ipc): bound MessageReader allocations by actual stream bytes [arrow-rs]

via GitHub Tue, 02 Jun 2026 20:00:40 -0700


masumi-ryugo commented on PR #9869:
URL: https://github.com/apache/arrow-rs/pull/9869#issuecomment-4608708759


   Thanks @alamb for the bench heads-up. The chunked-read approach was a poor 
trade against the steady-state cost — `extend_from_slice` over 64 KiB chunks 
into a fresh `MutableBuffer` per message wiped out the gain on the hot path.
   
   Reworked as up-front caps + the original fast path: `MAX_META_LEN = 16 MiB` 
and `MAX_BODY_LEN = 2 GiB` reject obvious junk headers before any large 
allocation, and legitimate inputs hit the same `resize` + `read_exact` / 
`from_len_zeroed` + `read_exact` they did before this PR.
   
   Local x86_64 bench on `StreamReader/no_validation/read_10`:
   
   | version | time |
   | --- | --- |
   | upstream \`main\` | 74.7 µs |
   | prev PR HEAD \`c36a092\` | 138.9 µs (+86%) |
   | this push \`73847c5\` | 74.3 µs |
   
   Caps are intentionally generous; happy to dial them tighter (e.g. metadata 
cap of 1 MiB) if you'd prefer a stricter ceiling. The 1.2 GiB-header regression 
test still passes.
   
   Disclosure: drafted with AI assistance, same caveats as the disclosure on 
#9884.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(arrow-ipc): bound MessageReader allocations by actual stream bytes [arrow-rs]

Reply via email to