aarashy opened a new issue, #3215: URL: https://github.com/apache/arrow-rs/issues/3215
**Describe the bug** <!-- A clear and concise description of what the bug is. --> When reading Arrow Bytes, it seems boolean array inputs can trigger panics under certain conditions. If my input bytes are bad, I want the arrow API to throw an `Err`, not panic. In this case, I don't think there's something wrong with my input bytes - rather, it seems like there's an off-by-one error internally within `arrow-rs` which is causing invariants to be broken. The error is being thrown in the `validate` routine: https://github.com/apache/arrow-rs/blob/master/arrow-data/src/data.rs#L667-L673 And it's being unwrapped in the caller in `create_primitive_array`, which DOES NOT return a `Result`. https://github.com/apache/arrow-rs/blob/master/arrow-ipc/src/reader.rs#L487 Judging by the fact that `unwrap` is used, this function presupposes certain invariants that make it safe to use `unwrap`, but for some inputs, these invariants seem to not be met. Since the problem appears to be with the length of a `buffer`, I'm tracing the origin of that buffer (per my stack trace) to https://github.com/apache/arrow-rs/blob/master/arrow-ipc/src/reader.rs#L1140 - I might be wrong. ``` thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Need at least 321 bytes for bitmap in buffers[0] in array of type Boolean, but got 320")', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/arrow-23.0.0/src/ipc/reader.rs:486:14 stack backtrace: 0: rust_begin_unwind at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5 1: core::panicking::panic_fmt at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14 2: core::result::unwrap_failed at ./rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/result.rs:1785:5 3: arrow::ipc::reader::create_primitive_array 4: arrow::ipc::reader::create_array 5: arrow::ipc::reader::read_record_batch 6: arrow::ipc::reader::StreamReader<R>::maybe_next 7: arrow::ipc::reader::StreamReader<R>::maybe_next 8: arrow::ipc::reader::StreamReader<R>::maybe_next 9: arrow::ipc::reader::StreamReader<R>::maybe_next 10: arrow::ipc::reader::StreamReader<R>::maybe_next 11: arrow::ipc::reader::StreamReader<R>::maybe_next 12: arrow::ipc::reader::StreamReader<R>::maybe_next 13: arrow::ipc::reader::StreamReader<R>::maybe_next 14: arrow::ipc::reader::StreamReader<R>::maybe_next 15: arrow::ipc::reader::StreamReader<R>::maybe_next 16: arrow::ipc::reader::StreamReader<R>::maybe_next 17: arrow::ipc::reader::StreamReader<R>::maybe_next 18: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter 19: core::iter::adapters::try_process note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ``` **To Reproduce** <!-- Steps to reproduce the behavior: --> I am using the following routine to read IPC bytes. I don't currently have an input `bytes` which triggers this, but I can find one if you think it's critical for you to debug this properly. ``` pub fn from_ipc_bytes(bytes: &[u8]) -> Result<Vec<RecordBatch>, anyhow::Error> { let cursor: Cursor<&[u8]> = Cursor::new(bytes); let reader = arrow::ipc::reader::StreamReader::try_new(cursor, None)?; let record_batches = reader.collect::<Result<Vec<RecordBatch>, arrow::error::ArrowError>>()?; Ok(record_batches) } ``` **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> Either throw an error if there's a problem with my input bytes, or succeed if there is no problem with my input bytes, but do not panic. **Additional context** <!-- Add any other context about the problem here. --> @alamb - You seem to have written these validation checks, so I wonder if you would understand what might be happening for me. Let me know if it's crucial for me to provide bytes inputs which trigger the panic, but I think the stacktrace and error message here might be enough to go off of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
