masumi-ryugo commented on PR #9884: URL: https://github.com/apache/arrow-rs/pull/9884#issuecomment-4378746608
Thanks @etseidl, and apologies for the noise on this one — you read it correctly. **Disclosure**: yes, this PR was drafted with AI assistance, and on a re-read against [the AI-generated code submission guidelines](https://arrow.apache.org/docs/dev/developers/overview.html#ai-generated-code) it hits all three of the pitfalls flagged there: overly verbose comments, unnecessary test cases, and a fix whose framing didn't match the actual bug. I should have caught that before sending. Going forward I'll do the same self-review pass against that guideline before opening any AI-assisted PR. **Substance check on your review**: you're right that the original framing was wrong. Walking through it on the actual code: - `Some(-1) as usize` → `usize::MAX` → `Vec::with_capacity(usize::MAX)` does panic with `capacity overflow` from `alloc::raw_vec::handle_error`. **This is the real bug**, and `usize::try_from(n)?` is the right fix. - `Some(i32::MAX) as usize` → `Vec::with_capacity(i32::MAX)` for `SchemaElement` (≈96 bytes) requests ~200 GB. On 64-bit Linux that's well under `isize::MAX` so the `alloc_guard` check passes; with default overcommit the allocation succeeds lazily and **never panics on main**. So the `i32::MAX` test wasn't exercising a real codepath, and the remaining-elements bound was protecting against a panic that the platform doesn't actually produce. - The 22-byte fuzzer repro: as you noted, doesn't error on main. It was tagged as a regression test based on libfuzzer/ASan reporting it as OOM, but those report any large allocation as OOM regardless of whether the native allocator would have succeeded. Not a regression test for the codebase's behavior; my mistake to label it that way. **Force-pushed `dc1793f`** (was `5847354`). Diff is now `parquet/src/schema/types.rs +19/-1`: - Applied your suggested edit verbatim — single-line change to use `Vec::with_capacity(usize::try_from(n)?)`. - Removed the 10-line comment block. - Removed the remaining-elements bound check. - Removed the `i32::MAX` overflow test and the 22-byte fuzzer-repro test (neither tested what they claimed). - Kept `test_parquet_schema_from_array_rejects_negative_num_children` — this one does fail on `5847354`'s parent (panics with `capacity overflow`) and passes here. - PR title and commit message rewritten to match the actual fix. `cargo test -p parquet --release --lib schema::`: 105 passed. `cargo fmt --check`, `cargo clippy --no-deps`: clean. If you'd like the remaining-elements consistency check (catching `num_children` that's positive but exceeds the schema-array tail), I'm happy to send it as a follow-up PR with a real failure case — i.e., a multi-element schema where the inner recursion currently produces a less-clear error than an upfront bound would. Otherwise, glad to drop it entirely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
