youichi-uda commented on PR #9868:
URL: https://github.com/apache/arrow-rs/pull/9868#issuecomment-4378346120
Thanks @etseidl, @alamb — pushed `65374f8` (force-push) scoping this PR down
per your 2026-05-04 review.
**What's still here**
- `read_list_begin`: parse the long-form list size as
`i32::try_from(self.read_vlq()?)?`, rejecting any varint above `i32::MAX` at
the protocol layer.
- New `ThriftProtocolError::IntegerOverflow` variant +
`From<TryFromIntError>` impl, mapped to `general_err!("integer overflow
decoding thrift value")`.
- `test_read_list_begin_size_above_i32_max_returns_err` covering the
wraparound directly. Verified it fails on base `fd86c75` (returns
`Ok(ListIdentifier { size: -2147483648 })`) and passes here.
**What's gone**
- The `read_thrift_vec` switch from `Vec::with_capacity` to
`try_reserve_exact` is reverted — answering @alamb's "I don't understand why
this would be reserve_exact": agreed it shouldn't, given the page-index
regression @etseidl tracked down. `read_thrift_vec` is back to the original
`Vec::with_capacity(list_ident.size as usize)`.
- `test_decode_metadata_huge_thrift_list_does_not_panic` removed — that test
was a regression guard for the `try_reserve_exact` behavior, so it doesn't
belong on a PR that no longer guarantees that.
Net diff vs `fd86c75` is now `parquet/src/parquet_thrift.rs +33/-1` (down
from +60/-2).
**Follow-ups, not in this PR**
- Broader `with_capacity` / `reserve` audit on attacker-controlled paths and
the OOM-vs-overhead tradeoff continues under #9874.
- The 50-byte ASan OOM repro I posted from a different account on 2026-05-03
is acknowledged and parked — given @etseidl's point that thrift input is
compressed (10k-column `SchemaElement` ≈ 96 × 10000 = ~960kB allocation from a
~138kB on-wire schema), an `input_remaining`-style bound would false-positive
on valid wide schemas. Any further work on that direction belongs in #9874, not
here.
`cargo test -p parquet --release --lib --skip
test_read_non_utf8_binary_as_utf8`: 1082 passed. `cargo fmt -p parquet
--check`: clean. `cargo clippy -p parquet --lib --no-deps`: clean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]