ShiKaiWi commented on PR #6281:
URL: https://github.com/apache/arrow-rs/pull/6281#issuecomment-2302539879

   > This is UB as per the safety contract on Vec::set_len. I suggest you take 
a look at MaybeUninit for how to do this sort of thing safely, and also why 
what you're doing here is UB
   
   @tustvold Thanks a lot for your suggestions.
   
   >  I suggest you take a look at MaybeUninit for how to do this sort of thing 
safely
   
   The `Vec<u8>` will be used by the snnapy (de)compression, so I guess 
introducing `MaybeUninit` is not allowed if we try to change the signature of 
the buffer.
   
   > why what you're doing here is UB
   
   Actually, this is UB here for the memory between `buf[old_len..new_len]` is 
uninitialized, but the tests in the CI works fine. And I guess the reason for 
this is that the uninitialized content won't be read before writes (or 
initialization) in the (de)compression process.
   
   And I find something interesting in the 
[example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5321aaeda859ea9f664ae7952ade2fb6)
 mentioned at #6276, `vec!` macro also performs much better than `resize` (20x 
faster), so I guess there are some magic in the `vec!` macro.
   
   After digging into the std source code about `vec!` macro, I find a fast 
path for the initialization of `Vec<u8>` and `Vec<i8>`. And the two code 
snippets show the fast path:
   
   
https://github.com/rust-lang/rust/blob/982c6f8721416431ec62bb0b9105c0578a9fc603/library/alloc/src/vec/mod.rs#L2752-L2754
   ```rust
   pub fn from_elem<T: Clone>(elem: T, n: usize) -> Vec<T> {
       <T as SpecFromElem>::from_elem(elem, n, Global)
   }
   ```
   
   
https://github.com/rust-lang/rust/blob/982c6f8721416431ec62bb0b9105c0578a9fc603/library/alloc/src/vec/spec_from_elem.rs#L47-L60
   ```rust
   impl SpecFromElem for u8 {
       #[inline]
       fn from_elem<A: Allocator>(elem: u8, n: usize, alloc: A) -> Vec<u8, A> {
           if elem == 0 {
               return Vec { buf: RawVec::with_capacity_zeroed_in(n, alloc), 
len: n };
           }
           let mut v = Vec::with_capacity_in(n, alloc);
           unsafe {
               ptr::write_bytes(v.as_mut_ptr(), elem, n);
               v.set_len(n);
           }
           v
       }
   }
   ```
   
   And a new 
[example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a407cda52e30f36bf730be2709c7bf5e)
 shows using the `ptr::write_bytes` + `Vec::set_len` performs much better than 
`resize` too.
   
   So how about using `ptr::write_bytes` to do initialization work before 
calling `Vec::set_len`? @tustvold Looking forward to your further suggestions!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to