masumi-ryugo commented on PR #9868:
URL: https://github.com/apache/arrow-rs/pull/9868#issuecomment-4378352005

   Thanks @etseidl, @alamb — pushed `65374f8` (force-push) scoping this PR down 
per your 2026-05-04 review.
   
   **What's still here**
   
   - `read_list_begin`: parse the long-form list size as 
`i32::try_from(self.read_vlq()?)?`, rejecting any varint above `i32::MAX` at 
the protocol layer.
   - New `ThriftProtocolError::IntegerOverflow` variant + 
`From<TryFromIntError>` impl, mapped to `general_err!("integer overflow 
decoding thrift value")`.
   - `test_read_list_begin_size_above_i32_max_returns_err` covering the 
wraparound directly. Verified it fails on base `fd86c75` (returns 
`Ok(ListIdentifier { size: -2147483648 })`) and passes here.
   
   **What's gone**
   
   - The `read_thrift_vec` switch from `Vec::with_capacity` to 
`try_reserve_exact` is reverted — answering @alamb's "I don't understand why 
this would be reserve_exact": agreed it shouldn't, given the page-index 
regression @etseidl tracked down. `read_thrift_vec` is back to the original 
`Vec::with_capacity(list_ident.size as usize)`.
   - `test_decode_metadata_huge_thrift_list_does_not_panic` removed — that test 
was a regression guard for the `try_reserve_exact` behavior, so it doesn't 
belong on a PR that no longer guarantees that.
   
   Net diff vs `fd86c75` is now `parquet/src/parquet_thrift.rs +33/-1` (down 
from +60/-2).
   
   **Follow-ups, not in this PR**
   
   - Broader `with_capacity` / `reserve` audit on attacker-controlled paths and 
the OOM-vs-overhead tradeoff continues under #9874.
   - The 50-byte ASan OOM repro posted from another account on 2026-05-03 is 
acknowledged and parked — given @etseidl's point that thrift input is 
compressed (10k-column `SchemaElement` ≈ 96 × 10000 = ~960kB allocation from a 
~138kB on-wire schema), an `input_remaining`-style bound would false-positive 
on valid wide schemas. Any further work on that direction belongs in #9874, not 
here.
   
   `cargo test -p parquet --release --lib --skip 
test_read_non_utf8_binary_as_utf8`: 1082 passed. `cargo fmt -p parquet 
--check`: clean. `cargo clippy -p parquet --lib --no-deps`: clean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to