aarashy opened a new issue, #3215:
URL: https://github.com/apache/arrow-rs/issues/3215

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   When reading Arrow Bytes, it seems boolean array inputs can trigger panics 
under certain conditions.
   If my input bytes are bad, I want the arrow API to throw an `Err`, not 
panic. 
   
   In this case, I don't think there's something wrong with my input bytes - 
rather, it seems like there's an off-by-one error internally within `arrow-rs` 
which is causing invariants to be broken. 
   
   The error is being thrown in the `validate` routine:
   
https://github.com/apache/arrow-rs/blob/master/arrow-data/src/data.rs#L667-L673
   
   And it's being unwrapped in the caller in `create_primitive_array`, which 
DOES NOT return a `Result`.
   https://github.com/apache/arrow-rs/blob/master/arrow-ipc/src/reader.rs#L487
   
   Judging by the fact that `unwrap` is used, this function presupposes certain 
invariants that make it safe to use `unwrap`, but for some inputs, these 
invariants seem to not be met.
   
   Since the problem appears to be with the length of a `buffer`, I'm tracing 
the origin of that buffer (per my stack trace) to 
https://github.com/apache/arrow-rs/blob/master/arrow-ipc/src/reader.rs#L1140 - 
I might be wrong.
   
   ```
   thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an 
`Err` value: InvalidArgumentError("Need at least 321 bytes for bitmap in 
buffers[0] in array of type Boolean, but got 320")', 
/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/arrow-23.0.0/src/ipc/reader.rs:486:14
   stack backtrace:
      0: rust_begin_unwind
                at 
./rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
      1: core::panicking::panic_fmt
                at 
./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
      2: core::result::unwrap_failed
                at 
./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/result.rs:1785:5
      3: arrow::ipc::reader::create_primitive_array
      4: arrow::ipc::reader::create_array
      5: arrow::ipc::reader::read_record_batch
      6: arrow::ipc::reader::StreamReader<R>::maybe_next
      7: arrow::ipc::reader::StreamReader<R>::maybe_next
      8: arrow::ipc::reader::StreamReader<R>::maybe_next
      9: arrow::ipc::reader::StreamReader<R>::maybe_next
     10: arrow::ipc::reader::StreamReader<R>::maybe_next
     11: arrow::ipc::reader::StreamReader<R>::maybe_next
     12: arrow::ipc::reader::StreamReader<R>::maybe_next
     13: arrow::ipc::reader::StreamReader<R>::maybe_next
     14: arrow::ipc::reader::StreamReader<R>::maybe_next
     15: arrow::ipc::reader::StreamReader<R>::maybe_next
     16: arrow::ipc::reader::StreamReader<R>::maybe_next
     17: arrow::ipc::reader::StreamReader<R>::maybe_next
     18: <alloc::vec::Vec<T> as 
alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
     19: core::iter::adapters::try_process
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   ```
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   I am using the following routine to read IPC bytes. I don't currently have 
an input `bytes` which triggers this, but I can find one if you think it's 
critical for you to debug this properly. 
   ```
   pub fn from_ipc_bytes(bytes: &[u8]) -> Result<Vec<RecordBatch>, 
anyhow::Error> {
       let cursor: Cursor<&[u8]> = Cursor::new(bytes);
       let reader = arrow::ipc::reader::StreamReader::try_new(cursor, None)?;
       let record_batches = reader.collect::<Result<Vec<RecordBatch>, 
arrow::error::ArrowError>>()?;
       Ok(record_batches)
   }
   ```
   
   
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   Either throw an error if there's a problem with my input bytes, or succeed 
if there is no problem with my input bytes, but do not panic.
   
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   @alamb - You seem to have written these validation checks, so I wonder if 
you would understand what might be happening for me. Let me know if it's 
crucial for me to provide bytes inputs which trigger the panic, but I think the 
stacktrace and error message here might be enough to go off of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to