Manishearth opened a new issue, #10252:
URL: https://github.com/apache/arrow-rs/issues/10252

   
   > [!NOTE]
   > This finding was identified during an agentic unsafe Rust code review 
performed by Gemini AI, followed by human review and verification.
   
   
   N.b. this was tested on v56, and from quickly looking at the code this 
appears to also be present on `main`.
   
   ### The Issue
   
   The public trait `ToByteSlice` in this crate is safe and unsealed.
   
   
https://github.com/apache/arrow-rs/blob/d8d3fa32c107ecae6c77c6f1db7700e2c7eb60ef/arrow-buffer/src/native.rs#L268-L271
   
   
   Because it is a safe trait, downstream safe code can implement it for any 
custom type and return a slice shorter than `std::mem::size_of::<Self>()`.
   
   When such a custom type is passed to `MutableBuffer::push`, the method 
calculates `additional = std::mem::size_of::<T>()` and executes 
`std::ptr::copy_nonoverlapping` reading `additional` bytes from 
`item.to_byte_slice().as_ptr()`.
   
   
   
https://github.com/apache/arrow-rs/blob/d8d3fa32c107ecae6c77c6f1db7700e2c7eb60ef/arrow-buffer/src/buffer/mutable.rs#L620-L629
   
   
   If `to_byte_slice()` returned a short slice, this triggers an out-of-bounds 
memory read, violating Rust safety guarantees without any `unsafe` blocks in 
caller code. Similar unsoundness also affects 
`MutableBuffer::extend_from_iter`, `MutableBuffer::from_trusted_len_iter`, and 
`MutableBuffer::try_from_trusted_len_iter`.
   
   <details><summary>Minimal Reproduction (Miri)</summary>
   
   ```rust
   use arrow_buffer::{MutableBuffer, ToByteSlice};
   
   struct BadType(u64);
   
   impl ToByteSlice for BadType {
       fn to_byte_slice(&self) -> &[u8] {
           &[0x42]
       }
   }
   
   fn main() {
       let mut buffer = MutableBuffer::new(0);
       buffer.push(BadType(0));
   }
   ```
   
   ```text
   error: Undefined Behavior: memory access failed: attempting to access 8 
bytes, but got alloc2 which is only 1 byte from the end of the allocation
      --> 
/usr/local/google/home/manishearth/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-59.0.0/src/buffer/mutable.rs:626:13
       |
   626 |             std::ptr::copy_nonoverlapping(src, dst, additional);
       |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Undefined Behavior occurred here
       |
       = help: this indicates a bug in the program: it performed an invalid 
operation, and caused Undefined Behavior
       = help: see 
https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html 
for further information
       = note: stack backtrace:
               0: arrow_buffer::MutableBuffer::push::<BadType>
                   at 
/usr/local/google/home/manishearth/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-59.0.0/src/buffer/mutable.rs:626:13:
 626:64
               1: main
                   at src/bin/repro1.rs:13:5: 13:28
   ```
   
   </details>
   
   <details><summary>Suggested Fix</summary>
   
   Make `ToByteSlice` an `unsafe trait` with a documented safety contract 
requiring
   `to_byte_slice` to return a slice of length `std::mem::size_of::<Self>()`, or
   seal the trait (for example, by adding a private supertrait bound) so 
downstream
   crates cannot implement it for arbitrary types. Since `ToByteSlice` is 
primarily
   intended for `ArrowNativeType` (which is already sealed), sealing 
`ToByteSlice`
   is likely the cleanest solution.
   
   </details>
   
   
--------------------------------------------------------------------------------
   
   > [!NOTE]
   > The full audit report below also contains additional minor findings (such 
as missing safety comments or undocumented FFI assumptions) that are probably 
worth fixing as well but not the primary goal of this issue. The audit report 
has not been human-reviewed, it may contain misleading claims.
   
   
   
   
   <details><summary>Full Gemini Codebase Audit Report Appendix</summary>
   
   # Unsafe Rust Review: `arrow_buffer` (`v56`)
   
   ## Overall Safety Assessment
   `arrow_buffer` contains a significant amount of unsafe code, which is 
typical for a low-level buffer management library that interfaces with foreign 
memory (FFI) and performs highly optimized bitwise and array operations.
   
   The unsafe code is generally well-structured and follows logical invariants, 
but safety documentation is lacking.
   Specifically:
   
   - Struct invariants are not formally documented on fields using `// Safety 
invariant:`.
   - Many `unsafe` blocks lack `// SAFETY:` comments, and the existing comments 
are often brief and do not detail the formal proof obligations.
   - A critical soundness issue exists in the public API due to the safe 
`ToByteSlice` trait, which allows safe code to trigger out-of-bounds reads if 
implemented incorrectly by a downstream crate.
   
   ## Critical Findings
   
   ### Unsoundness in Public `ToByteSlice` Trait and `MutableBuffer::push` ๐Ÿ”ด ๐Ÿงช
   
   -   **Priority**: ๐Ÿ”ด High
   - **Threat Vector**: ๐Ÿงช Contrived Setup
   
   
   -   **Bug Type**: `Trait Trust`
   
   The trait `ToByteSlice` is defined as:
   
   ```rust
   
   pub trait ToByteSlice {
       fn to_byte_slice(&self) -> &[u8];
   }
   
   ```
   
   This is a public, safe, and unsealed trait.
   
   The function `MutableBuffer::push` is defined as:
   
   ```rust
   
       pub fn push<T: ToByteSlice>(&mut self, item: T) {
           let additional = std::mem::size_of::<T>();
           self.reserve(additional);
           unsafe {
               let src = item.to_byte_slice().as_ptr();
               let dst = self.data.as_ptr().add(self.len);
               std::ptr::copy_nonoverlapping(src, dst, additional);
           }
           self.len += additional;
       }
   
   ```
   
   Because `ToByteSlice` is safe, a downstream crate can implement it for a 
custom type of size 8, but return a slice of length 1:
   
   ```rust
   
   struct MyType(u64);
   impl ToByteSlice for MyType {
       fn to_byte_slice(&self) -> &[u8] {
           &[0]
       }
   }
   
   ```
   
   Calling `buffer.push(MyType(0))` is safe, but it will execute 
`std::ptr::copy_nonoverlapping` with `additional = 8` and `src` pointing to a 
1-byte slice. This triggers an out-of-bounds read of 7 bytes, which is 
undefined behavior.
   
   Similar unsoundness affects:
   
   - `MutableBuffer::push_unchecked`
   - `MutableBuffer::extend_from_iter`
   - `MutableBuffer::from_trusted_len_iter`
   - `MutableBuffer::try_from_trusted_len_iter`
   
   **Remedy:**
   Make `ToByteSlice` an `unsafe trait` (stating in the safety docs that 
`to_byte_slice` must return a slice of length equal to `size_of::<Self>()`), or 
seal the trait (e.g., make it depend on a private supertrait) so that 
downstream crates cannot implement it. Since it is only intended to be 
implemented for `ArrowNativeType` (which is already sealed), sealing it is the 
best option.
   
   ## Fishy Findings
   None.
   
   ## Missing Safety Comments
   Here are the key places where safety comments are missing or inadequate, 
along with proposed comments:
   
   ### 1. `src/bytes.rs:214` - `Bytes::deref` ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self.ptr` is non-null and aligned to the alignment specified in 
`self.deallocation` (which is at least 1).
   // - `self.ptr` is valid for reads of `self.len` bytes by the invariant of 
`Bytes`.
   // - The memory is owned by `self` and cannot be mutated for the duration of 
the returned slice.
   unsafe { slice::from_raw_parts(self.ptr.as_ptr(), self.len) }
   
   ```
   
   ### 2. `src/bytes.rs:202` - `Bytes::drop` ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self.ptr` was allocated by the global allocator with `*layout` layout, 
which is guaranteed by the invariant of `Bytes` when `self.deallocation` is 
`Deallocation::Standard`.
   unsafe { std::alloc::dealloc(self.ptr.as_ptr(), *layout) }
   
   ```
   
   ### 3. `src/buffer/immutable.rs:249` - `Buffer::as_slice` ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self.ptr` is non-null and aligned for `u8`.
   // - `self.ptr` is valid for reads of `self.length` bytes because it is 
derived from `self.data` (which is valid for at least `self.length` bytes 
starting at `self.ptr`).
   // - The underlying `Bytes` buffer is immutable and shared, so the memory 
will not be mutated for the lifetime of the slice.
   unsafe { std::slice::from_raw_parts(self.ptr, self.length) }
   
   ```
   
   ### 4. `src/buffer/mutable.rs:176` - `MutableBuffer::with_bitset` ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self.data` is valid for writes of `self.layout.size()` bytes.
   // - `end <= self.layout.size()` is checked by the assertion above.
   // - Alignment of `u8` is 1, so `self.data` is aligned.
   unsafe {
       std::ptr::write_bytes(self.data.as_ptr(), v, end);
       self.len = end;
   }
   
   ```
   
   ### 5. `src/buffer/mutable.rs:231` - `MutableBuffer::reallocate` (dealloc) ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self.data` was allocated using the global allocator with layout 
`self.layout`.
   // - `self.layout` is non-zero in size.
   unsafe { std::alloc::dealloc(self.as_mut_ptr(), self.layout) };
   
   ```
   
   ### 6. `src/buffer/mutable.rs:241` - `MutableBuffer::reallocate` (realloc) ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self.data` was allocated using the global allocator with layout 
`self.layout`.
   // - `capacity` is non-zero.
   // - `capacity` does not overflow `isize::MAX` when rounded to alignment 
(guaranteed by `new_layout` creation).
   unsafe { std::alloc::realloc(self.as_mut_ptr(), self.layout, capacity) }
   
   ```
   
   ### 7. `src/buffer/mutable.rs:431` - `MutableBuffer::extend_from_slice` ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `src` is valid for reads of `additional` bytes because it's derived 
from `items` slice.
   // - `dst` is valid for writes of `additional` bytes because 
`self.reserve(additional)` guarantees the buffer capacity is sufficient.
   // - `items` is a borrow of independent memory, so `src` and `dst` do not 
overlap.
   std::ptr::copy_nonoverlapping(src, dst, additional)
   
   ```
   
   ### 8. `src/bigint/div.rs:293` - `ArrayPlusOne::deref` ๐ŸŸก
   
   -   **Priority**: ๐ŸŸก Low
   -   **Bug Type**: `Missing Safety Comment`
   
   ```rust
   
   // SAFETY:
   // - `self` is `#[repr(C)] struct ArrayPlusOne([T; N], T)`, which lays out 
`[T; N]` contiguously with the final `T` element without padding.
   // - Thus `self` is a contiguous block of `N + 1` elements of type `T`.
   // - Casting `self as *const Self` to `*const T` is aligned and valid for 
reads of `N + 1` elements.
   unsafe { std::slice::from_raw_parts(x as *const T, N + 1) }
   
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to