masumi-ryugo commented on PR #9869: URL: https://github.com/apache/arrow-rs/pull/9869#issuecomment-4608708759
Thanks @alamb for the bench heads-up. The chunked-read approach was a poor trade against the steady-state cost — `extend_from_slice` over 64 KiB chunks into a fresh `MutableBuffer` per message wiped out the gain on the hot path. Reworked as up-front caps + the original fast path: `MAX_META_LEN = 16 MiB` and `MAX_BODY_LEN = 2 GiB` reject obvious junk headers before any large allocation, and legitimate inputs hit the same `resize` + `read_exact` / `from_len_zeroed` + `read_exact` they did before this PR. Local x86_64 bench on `StreamReader/no_validation/read_10`: | version | time | | --- | --- | | upstream \`main\` | 74.7 µs | | prev PR HEAD \`c36a092\` | 138.9 µs (+86%) | | this push \`73847c5\` | 74.3 µs | Caps are intentionally generous; happy to dial them tighter (e.g. metadata cap of 1 MiB) if you'd prefer a stricter ceiling. The 1.2 GiB-header regression test still passes. Disclosure: drafted with AI assistance, same caveats as the disclosure on #9884. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
