shinmao opened a new issue, #9106:
URL: https://github.com/apache/arrow-rs/issues/9106

   **Bug Description**
   The root cause is the missing invariant validation between `Array::len()` 
and `Array::logical_nulls().len()` in `try_binary`, leading to OOB writes when 
processing arrays with inconsistent metadata. The `try_binary` function 
performs unchecked memory writes based on the length reported by 
`logical_nulls()`, but only allocates buffer space based on `Array::len()`. 
Since `Array` is not an `unsafe trait`, user implementations can violate the 
implicit assumption that these two values must match.
   
https://github.com/apache/arrow-rs/blob/2507946be697d511689a8f59b6a7db45ef40854b/arrow-arith/src/arity.rs#L271-L291
   `Array` is not an `unsafe trait`, so implementors have no safety contract to 
uphold. The implicit assumption is violated where `logical_nulls().len() == 
len()`, but this is not documented as a requirement, not enforced by the type 
system, and also not validated at runtime. Therefore, the attack can forge 
`len() = 1` and `logical_nulls().len() = 2`, which causes 1-element buffer 
allocation, but 2-iteration loop , **OOB write at index 1**
   
   **To Reproduce**
   ```rs
   use arrow_arith::arity::try_binary;
   use arrow_array::{Array, ArrayAccessor, Int32Array};
   use arrow_array::types::Int32Type;
   use arrow_buffer::NullBuffer;
   use arrow_data::ArrayData;
   use std::sync::Arc;
   
   #[derive(Debug, Clone)]
   struct Evil {
       inner: Int32Array, fake_nulls: NullBuffer,
   }
   
   impl Array for Evil {
       fn as_any(&self) -> &dyn std::any::Any { self }
       fn to_data(&self) -> ArrayData { self.inner.to_data() }
       fn into_data(self) -> ArrayData { self.inner.into_data() }
       fn data_type(&self) -> &arrow_schema::DataType { self.inner.data_type() }
       fn slice(&self, o: usize, l: usize) -> Arc<dyn Array> {
           Arc::new(Evil {
               inner: self.inner. slice(o, l),
               fake_nulls: self.fake_nulls.slice(o, l),
           })
       }
       fn len(&self) -> usize { 1 }
       fn is_empty(&self) -> bool { false }
       fn offset(&self) -> usize { 0 }
       fn nulls(&self) -> Option<&NullBuffer> { Some(&self.fake_nulls) }
       fn logical_nulls(&self) -> Option<NullBuffer> {
           Some(NullBuffer::new_valid(2))
       }
       fn get_buffer_memory_size(&self) -> usize { 0 }
       fn get_array_memory_size(&self) -> usize { 0 }
   }
   
   impl ArrayAccessor for Evil {
       type Item = i32;
       fn value(&self, i: usize) -> i32 { self.inner.value(i) }
       unsafe fn value_unchecked(&self, i: usize) -> i32 {
           self.inner.value_unchecked(i)
       }
   }
   
   fn main() {
       let inner = Int32Array::from(vec![42, 100]);
       let fake_nulls = NullBuffer::from(vec![false]);  // [null]
       let a = Evil { inner:  inner. clone(), fake_nulls: fake_nulls.clone() };
       let b = Evil { inner, fake_nulls };
       let result = try_binary::<_, _, _, Int32Type>(a, b, |x, y| {
           println! ("  op({}, {})", x, y);
           Ok(x + y)
       });
       match result {
           Ok(arr) => println!("\n Result: {:?}", arr),
           Err(e) => println!("\n Error: {}", e),
       }
   }
   ```
   with cargo run,
   ```sh
   thread 'main' (3689217) panicked at 
/root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-arith-57.1.0/src/arity.rs:285:24:
   unsafe precondition(s) violated: slice::get_unchecked_mut requires that the 
index is within the slice
   
   This indicates a bug in the program. This Undefined Behavior check is 
optional, and cannot be relied on for safety.
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   thread caused non-unwinding panic. aborting.
   Aborted (core dumped)
   ```
   
   **Additional Context**
   Relevant PR: https://github.com/apache/arrow-rs/pull/9092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to