shinmao opened a new issue, #9106: URL: https://github.com/apache/arrow-rs/issues/9106
**Bug Description** The root cause is the missing invariant validation between `Array::len()` and `Array::logical_nulls().len()` in `try_binary`, leading to OOB writes when processing arrays with inconsistent metadata. The `try_binary` function performs unchecked memory writes based on the length reported by `logical_nulls()`, but only allocates buffer space based on `Array::len()`. Since `Array` is not an `unsafe trait`, user implementations can violate the implicit assumption that these two values must match. https://github.com/apache/arrow-rs/blob/2507946be697d511689a8f59b6a7db45ef40854b/arrow-arith/src/arity.rs#L271-L291 `Array` is not an `unsafe trait`, so implementors have no safety contract to uphold. The implicit assumption is violated where `logical_nulls().len() == len()`, but this is not documented as a requirement, not enforced by the type system, and also not validated at runtime. Therefore, the attack can forge `len() = 1` and `logical_nulls().len() = 2`, which causes 1-element buffer allocation, but 2-iteration loop , **OOB write at index 1** **To Reproduce** ```rs use arrow_arith::arity::try_binary; use arrow_array::{Array, ArrayAccessor, Int32Array}; use arrow_array::types::Int32Type; use arrow_buffer::NullBuffer; use arrow_data::ArrayData; use std::sync::Arc; #[derive(Debug, Clone)] struct Evil { inner: Int32Array, fake_nulls: NullBuffer, } impl Array for Evil { fn as_any(&self) -> &dyn std::any::Any { self } fn to_data(&self) -> ArrayData { self.inner.to_data() } fn into_data(self) -> ArrayData { self.inner.into_data() } fn data_type(&self) -> &arrow_schema::DataType { self.inner.data_type() } fn slice(&self, o: usize, l: usize) -> Arc<dyn Array> { Arc::new(Evil { inner: self.inner. slice(o, l), fake_nulls: self.fake_nulls.slice(o, l), }) } fn len(&self) -> usize { 1 } fn is_empty(&self) -> bool { false } fn offset(&self) -> usize { 0 } fn nulls(&self) -> Option<&NullBuffer> { Some(&self.fake_nulls) } fn logical_nulls(&self) -> Option<NullBuffer> { Some(NullBuffer::new_valid(2)) } fn get_buffer_memory_size(&self) -> usize { 0 } fn get_array_memory_size(&self) -> usize { 0 } } impl ArrayAccessor for Evil { type Item = i32; fn value(&self, i: usize) -> i32 { self.inner.value(i) } unsafe fn value_unchecked(&self, i: usize) -> i32 { self.inner.value_unchecked(i) } } fn main() { let inner = Int32Array::from(vec![42, 100]); let fake_nulls = NullBuffer::from(vec![false]); // [null] let a = Evil { inner: inner. clone(), fake_nulls: fake_nulls.clone() }; let b = Evil { inner, fake_nulls }; let result = try_binary::<_, _, _, Int32Type>(a, b, |x, y| { println! (" op({}, {})", x, y); Ok(x + y) }); match result { Ok(arr) => println!("\n Result: {:?}", arr), Err(e) => println!("\n Error: {}", e), } } ``` with cargo run, ```sh thread 'main' (3689217) panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-arith-57.1.0/src/arity.rs:285:24: unsafe precondition(s) violated: slice::get_unchecked_mut requires that the index is within the slice This indicates a bug in the program. This Undefined Behavior check is optional, and cannot be relied on for safety. note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace thread caused non-unwinding panic. aborting. Aborted (core dumped) ``` **Additional Context** Relevant PR: https://github.com/apache/arrow-rs/pull/9092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
