omertt27 opened a new pull request, #50154:
URL: https://github.com/apache/arrow/pull/50154

   ### Rationale for this change
   
   When the `CappedMemoryPool` (2.2 GB limit, added in GH-48105) triggers an 
OOM during encoding roundtrip verification, the resulting `Status::OutOfMemory` 
propagates back to call sites that used `ARROW_CHECK_OK(...)` or 
`.ValueOrDie()`. Both of these expand to `ARROW_LOG(FATAL)` → `std::abort()`, 
which is not an exception — `BEGIN_PARQUET_CATCH_EXCEPTIONS` cannot intercept 
it. OSS-Fuzz then sees a process crash instead of a resource-limit event.
   
   ### What changes are included in this PR?
   
   Added `FuzzCheckOk(Status)` in the anonymous namespace of 
`fuzz_encoding_internal.cc`:
   
   ```cpp
   // OOM during fuzzing is an expected soft failure; any other non-OK status
   // indicates a real bug and should abort so OSS-Fuzz can report it.
   Status FuzzCheckOk(const Status& st) {
     if (st.IsOutOfMemory()) return st;
     ARROW_CHECK_OK(st);
     return Status::OK();
   }
   ```
   
   Six call sites in `TypedFuzzEncoding::Fuzz()` replaced with 
`RETURN_NOT_OK(FuzzCheckOk(...))`:
   - `reference_array_->ValidateFull()`
   - `DecodeArrow(...).ValueOrDie()` (replaced with status check + 
`std::move(*result)`)
   - `array->ValidateFull()` (on roundtrip result)
   - `CompareAgainstReference(array)`
   - `RunOnDecodedChunks(...)` × 2 (both `arrow_supported()` and non-Arrow 
branches)
   
   Three invariant checks intentionally left as hard aborts — they indicate 
actual decoder/encoder bugs, not resource limits:
   - `ARROW_CHECK_LE(values_read, read_size)` — decoder returning more values 
than requested
   - `ARROW_CHECK_EQ(acc.chunks.size(), 0)` — BinaryBuilder invariant
   - `ARROW_CHECK_EQ(offset, total_data_size)` — byte count invariant in 
`MakeArrow`
   
   ### Are these changes tested?
   
   The OOM path is exercised by `fuzzing_memory_pool()` in `fuzz_internal.cc` 
whenever the cumulative allocation exceeds the 2.2 GB cap. The existing 
`parquet-encoding-test` suite covers the non-OOM code paths.
   
   Closes #50149


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to