Manishearth opened a new issue, #10252: URL: https://github.com/apache/arrow-rs/issues/10252
> [!NOTE] > This finding was identified during an agentic unsafe Rust code review performed by Gemini AI, followed by human review and verification. N.b. this was tested on v56, and from quickly looking at the code this appears to also be present on `main`. ### The Issue The public trait `ToByteSlice` in this crate is safe and unsealed. https://github.com/apache/arrow-rs/blob/d8d3fa32c107ecae6c77c6f1db7700e2c7eb60ef/arrow-buffer/src/native.rs#L268-L271 Because it is a safe trait, downstream safe code can implement it for any custom type and return a slice shorter than `std::mem::size_of::<Self>()`. When such a custom type is passed to `MutableBuffer::push`, the method calculates `additional = std::mem::size_of::<T>()` and executes `std::ptr::copy_nonoverlapping` reading `additional` bytes from `item.to_byte_slice().as_ptr()`. https://github.com/apache/arrow-rs/blob/d8d3fa32c107ecae6c77c6f1db7700e2c7eb60ef/arrow-buffer/src/buffer/mutable.rs#L620-L629 If `to_byte_slice()` returned a short slice, this triggers an out-of-bounds memory read, violating Rust safety guarantees without any `unsafe` blocks in caller code. Similar unsoundness also affects `MutableBuffer::extend_from_iter`, `MutableBuffer::from_trusted_len_iter`, and `MutableBuffer::try_from_trusted_len_iter`. <details><summary>Minimal Reproduction (Miri)</summary> ```rust use arrow_buffer::{MutableBuffer, ToByteSlice}; struct BadType(u64); impl ToByteSlice for BadType { fn to_byte_slice(&self) -> &[u8] { &[0x42] } } fn main() { let mut buffer = MutableBuffer::new(0); buffer.push(BadType(0)); } ``` ```text error: Undefined Behavior: memory access failed: attempting to access 8 bytes, but got alloc2 which is only 1 byte from the end of the allocation --> /usr/local/google/home/manishearth/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-59.0.0/src/buffer/mutable.rs:626:13 | 626 | std::ptr::copy_nonoverlapping(src, dst, additional); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here | = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information = note: stack backtrace: 0: arrow_buffer::MutableBuffer::push::<BadType> at /usr/local/google/home/manishearth/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-59.0.0/src/buffer/mutable.rs:626:13: 626:64 1: main at src/bin/repro1.rs:13:5: 13:28 ``` </details> <details><summary>Suggested Fix</summary> Make `ToByteSlice` an `unsafe trait` with a documented safety contract requiring `to_byte_slice` to return a slice of length `std::mem::size_of::<Self>()`, or seal the trait (for example, by adding a private supertrait bound) so downstream crates cannot implement it for arbitrary types. Since `ToByteSlice` is primarily intended for `ArrowNativeType` (which is already sealed), sealing `ToByteSlice` is likely the cleanest solution. </details> -------------------------------------------------------------------------------- > [!NOTE] > The full audit report below also contains additional minor findings (such as missing safety comments or undocumented FFI assumptions) that are probably worth fixing as well but not the primary goal of this issue. The audit report has not been human-reviewed, it may contain misleading claims. <details><summary>Full Gemini Codebase Audit Report Appendix</summary> # Unsafe Rust Review: `arrow_buffer` (`v56`) ## Overall Safety Assessment `arrow_buffer` contains a significant amount of unsafe code, which is typical for a low-level buffer management library that interfaces with foreign memory (FFI) and performs highly optimized bitwise and array operations. The unsafe code is generally well-structured and follows logical invariants, but safety documentation is lacking. Specifically: - Struct invariants are not formally documented on fields using `// Safety invariant:`. - Many `unsafe` blocks lack `// SAFETY:` comments, and the existing comments are often brief and do not detail the formal proof obligations. - A critical soundness issue exists in the public API due to the safe `ToByteSlice` trait, which allows safe code to trigger out-of-bounds reads if implemented incorrectly by a downstream crate. ## Critical Findings ### Unsoundness in Public `ToByteSlice` Trait and `MutableBuffer::push` ๐ด ๐งช - **Priority**: ๐ด High - **Threat Vector**: ๐งช Contrived Setup - **Bug Type**: `Trait Trust` The trait `ToByteSlice` is defined as: ```rust pub trait ToByteSlice { fn to_byte_slice(&self) -> &[u8]; } ``` This is a public, safe, and unsealed trait. The function `MutableBuffer::push` is defined as: ```rust pub fn push<T: ToByteSlice>(&mut self, item: T) { let additional = std::mem::size_of::<T>(); self.reserve(additional); unsafe { let src = item.to_byte_slice().as_ptr(); let dst = self.data.as_ptr().add(self.len); std::ptr::copy_nonoverlapping(src, dst, additional); } self.len += additional; } ``` Because `ToByteSlice` is safe, a downstream crate can implement it for a custom type of size 8, but return a slice of length 1: ```rust struct MyType(u64); impl ToByteSlice for MyType { fn to_byte_slice(&self) -> &[u8] { &[0] } } ``` Calling `buffer.push(MyType(0))` is safe, but it will execute `std::ptr::copy_nonoverlapping` with `additional = 8` and `src` pointing to a 1-byte slice. This triggers an out-of-bounds read of 7 bytes, which is undefined behavior. Similar unsoundness affects: - `MutableBuffer::push_unchecked` - `MutableBuffer::extend_from_iter` - `MutableBuffer::from_trusted_len_iter` - `MutableBuffer::try_from_trusted_len_iter` **Remedy:** Make `ToByteSlice` an `unsafe trait` (stating in the safety docs that `to_byte_slice` must return a slice of length equal to `size_of::<Self>()`), or seal the trait (e.g., make it depend on a private supertrait) so that downstream crates cannot implement it. Since it is only intended to be implemented for `ArrowNativeType` (which is already sealed), sealing it is the best option. ## Fishy Findings None. ## Missing Safety Comments Here are the key places where safety comments are missing or inadequate, along with proposed comments: ### 1. `src/bytes.rs:214` - `Bytes::deref` ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self.ptr` is non-null and aligned to the alignment specified in `self.deallocation` (which is at least 1). // - `self.ptr` is valid for reads of `self.len` bytes by the invariant of `Bytes`. // - The memory is owned by `self` and cannot be mutated for the duration of the returned slice. unsafe { slice::from_raw_parts(self.ptr.as_ptr(), self.len) } ``` ### 2. `src/bytes.rs:202` - `Bytes::drop` ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self.ptr` was allocated by the global allocator with `*layout` layout, which is guaranteed by the invariant of `Bytes` when `self.deallocation` is `Deallocation::Standard`. unsafe { std::alloc::dealloc(self.ptr.as_ptr(), *layout) } ``` ### 3. `src/buffer/immutable.rs:249` - `Buffer::as_slice` ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self.ptr` is non-null and aligned for `u8`. // - `self.ptr` is valid for reads of `self.length` bytes because it is derived from `self.data` (which is valid for at least `self.length` bytes starting at `self.ptr`). // - The underlying `Bytes` buffer is immutable and shared, so the memory will not be mutated for the lifetime of the slice. unsafe { std::slice::from_raw_parts(self.ptr, self.length) } ``` ### 4. `src/buffer/mutable.rs:176` - `MutableBuffer::with_bitset` ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self.data` is valid for writes of `self.layout.size()` bytes. // - `end <= self.layout.size()` is checked by the assertion above. // - Alignment of `u8` is 1, so `self.data` is aligned. unsafe { std::ptr::write_bytes(self.data.as_ptr(), v, end); self.len = end; } ``` ### 5. `src/buffer/mutable.rs:231` - `MutableBuffer::reallocate` (dealloc) ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self.data` was allocated using the global allocator with layout `self.layout`. // - `self.layout` is non-zero in size. unsafe { std::alloc::dealloc(self.as_mut_ptr(), self.layout) }; ``` ### 6. `src/buffer/mutable.rs:241` - `MutableBuffer::reallocate` (realloc) ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self.data` was allocated using the global allocator with layout `self.layout`. // - `capacity` is non-zero. // - `capacity` does not overflow `isize::MAX` when rounded to alignment (guaranteed by `new_layout` creation). unsafe { std::alloc::realloc(self.as_mut_ptr(), self.layout, capacity) } ``` ### 7. `src/buffer/mutable.rs:431` - `MutableBuffer::extend_from_slice` ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `src` is valid for reads of `additional` bytes because it's derived from `items` slice. // - `dst` is valid for writes of `additional` bytes because `self.reserve(additional)` guarantees the buffer capacity is sufficient. // - `items` is a borrow of independent memory, so `src` and `dst` do not overlap. std::ptr::copy_nonoverlapping(src, dst, additional) ``` ### 8. `src/bigint/div.rs:293` - `ArrayPlusOne::deref` ๐ก - **Priority**: ๐ก Low - **Bug Type**: `Missing Safety Comment` ```rust // SAFETY: // - `self` is `#[repr(C)] struct ArrayPlusOne([T; N], T)`, which lays out `[T; N]` contiguously with the final `T` element without padding. // - Thus `self` is a contiguous block of `N + 1` elements of type `T`. // - Casting `self as *const Self` to `*const T` is aligned and valid for reads of `N + 1` elements. unsafe { std::slice::from_raw_parts(x as *const T, N + 1) } ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
