alamb commented on code in PR #3756:
URL: https://github.com/apache/arrow-rs/pull/3756#discussion_r1118756567


##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are 
trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };

Review Comment:
   It seems like `into_raw_parts` would be ideal here, but sadly it appears to 
not yet be stabalized
   
   https://doc.rust-lang.org/beta/std/vec/struct.Vec.html#method.into_raw_parts
   
   I note that the docs say 
   
   > After calling this function, the caller is responsible for the memory 
previously managed by the Vec. The only way to do this is to convert the raw 
pointer, length, and capacity back into a Vec with the 
[from_raw_parts](https://github.com/apache/arrow-rs/pull/3756/struct.Vec.html#method.from_raw_parts)
 function, allowing the destructor to perform the cleanup.
   
   However, this code uses `Deallocation::Standard(layout)` to deallocate 
memory, which seems ok, I just wanted to point out it doesn't match the docs 
(though maybe this is fine)



##########
arrow-buffer/src/alloc/mod.rs:
##########
@@ -132,9 +137,8 @@ impl<T: RefUnwindSafe + Send + Sync> Allocation for T {}
 
 /// Mode of deallocating memory regions
 pub(crate) enum Deallocation {
-    /// An allocation of the given capacity that needs to be deallocated using 
arrows's cache aligned allocator.
-    /// See [allocate_aligned] and [free_aligned].
-    Arrow(usize),
+    /// An allocation using [`std::alloc`]
+    Standard(Layout),
     /// An allocation from an external source like the FFI interface or a Rust 
Vec.

Review Comment:
   I wonder if the comment for `Custom` should be updated to say like "external 
Rust Vec" or maybe remove the reference to Rust Vec entirely



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();

Review Comment:
   I think it is also worth a test when the capacity doesn't equally divide 
equally into the transmuted size -- like maybe change this test to have 
capacity `5` with `i128` and verify the `i256` output Vec only has capacity 2 



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -69,6 +71,21 @@ impl Buffer {
         }
     }
 
+    /// Create a [`Buffer`] from the provided `Vec` without copying
+    #[inline]
+    pub fn from_vec<T: ArrowNativeType>(vec: Vec<T>) -> Self {
+        // Safety
+        // Vec::as_ptr guaranteed to not be null and ArrowNativeType are 
trivially transmutable
+        let ptr = unsafe { NonNull::new_unchecked(vec.as_ptr() as _) };
+        let len = vec.len() * std::mem::size_of::<T>();
+        // Safety
+        // Layout guaranteed to be valid
+        let layout = unsafe { 
Layout::array::<T>(vec.capacity()).unwrap_unchecked() };

Review Comment:
   Is there a reason to use unwrap_unchecked here? Maybe it would be better to 
panic if the layout was messed up somehow? 
   
   Looks like it errors on overflow 
https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.array
   
   Like maybe the size overflows because someone passed in a giant Vec or 
something?
   
   🤔 



##########
arrow-buffer/src/buffer/scalar.rs:
##########
@@ -90,6 +90,15 @@ impl<T: ArrowNativeType> From<Buffer> for ScalarBuffer<T> {
     }
 }
 
+impl<T: ArrowNativeType> From<Vec<T>> for ScalarBuffer<T> {

Review Comment:
   😍 



##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -95,12 +102,14 @@ impl MutableBuffer {
 
     /// Allocates a new [MutableBuffer] from given `Bytes`.

Review Comment:
   maybe it is worth updating these docs to explain when an Err is returned



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,115 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {
+        // Test empty vec
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        b.into_vec::<i128>().unwrap();
+
+        // Test vec with capacity
+        let a: Vec<i128> = Vec::with_capacity(20);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 0);
+        assert_eq!(back.capacity(), 20);
+
+        // Test vec with values
+        let mut a: Vec<i128> = Vec::with_capacity(3);
+        a.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 3);
+        assert_eq!(back.capacity(), 3);
+
+        // Test vec with values and spare capacity
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<i128>().unwrap();
+        assert_eq!(back.len(), 7);
+        assert_eq!(back.capacity(), 20);
+
+        // Test incorrect alignment
+        let a: Vec<i128> = Vec::new();
+        let b = Buffer::from_vec(a);
+        let b = b.into_vec::<i32>().unwrap_err();
+        b.into_vec::<i8>().unwrap_err();
+
+        // Test convert between types with same alignment
+        // This is an implementation quirk, but isn't harmful
+        // as ArrowNativeType are trivially transmutable
+        let a: Vec<i64> = vec![1, 2, 3, 4];
+        let b = Buffer::from_vec(a);
+        let back = b.into_vec::<u64>().unwrap();
+        assert_eq!(back.len(), 4);
+        assert_eq!(back.capacity(), 4);
+
+        // i256 has the same layout as i128 so this is valid
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 2);
+        assert_eq!(back.capacity(), 2);
+
+        // Invalid layout
+        let b: Vec<i128> = vec![1, 2, 3];
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Invalid layout
+        let mut b: Vec<i128> = Vec::with_capacity(5);
+        b.extend_from_slice(&[1, 2, 3, 4]);
+        let b = Buffer::from_vec(b);
+        b.into_vec::<i256>().unwrap_err();
+
+        // Truncates length
+        // This is an implementation quirk, but isn't harmful
+        let mut b: Vec<i128> = Vec::with_capacity(4);
+        b.extend_from_slice(&[1, 2, 3]);
+        let b = Buffer::from_vec(b);
+        let back = b.into_vec::<i256>().unwrap();
+        assert_eq!(back.len(), 1);
+        assert_eq!(back.capacity(), 2);
+
+        // Cannot use aligned allocation
+        let b = Buffer::from(MutableBuffer::new(10));
+        let b = b.into_vec::<u8>().unwrap_err();
+        b.into_vec::<u64>().unwrap_err();
+
+        // Test slicing
+        let mut a: Vec<i128> = Vec::with_capacity(20);
+        a.extend_from_slice(&[1, 4, 7, 8, 9, 3, 6]);
+        let b = Buffer::from_vec(a);
+        let slice = b.slice_with_length(0, 64);
+
+        // Shared reference fails
+        let slice = slice.into_vec::<i128>().unwrap_err();
+        drop(b);
+
+        // Succeeds as no outstanding shared reference
+        let back = slice.into_vec::<i128>().unwrap();
+        assert_eq!(&back, &[1, 4, 7, 8]);
+        assert_eq!(back.capacity(), 20);
+
+        // Slicing by non-multiple length truncates
+        let mut a: Vec<i128> = Vec::with_capacity(8);
+        a.extend_from_slice(&[1, 4, 7, 3]);
+
+        let b = Buffer::from_vec(a);
+        let slice = b.slice_with_length(0, 34);
+        drop(b);
+
+        let back = slice.into_vec::<i128>().unwrap();

Review Comment:
   Is slicing with a non zero offset also covered? Maybe `slice(2)` below 
covers that 🤔 



##########
arrow-buffer/src/buffer/immutable.rs:
##########
@@ -632,4 +690,83 @@ mod tests {
         let buffer = Buffer::from(MutableBuffer::from_len_zeroed(12));
         buffer.slice_with_length(2, usize::MAX);
     }
+
+    #[test]
+    fn test_vec_interop() {

Review Comment:
   > this is perhaps not ideal but isn't harmful.
   
   Is it a problem for something like converting integers to (invalid) floats 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to