ShiKaiWi commented on PR #6281: URL: https://github.com/apache/arrow-rs/pull/6281#issuecomment-2302539879
> This is UB as per the safety contract on Vec::set_len. I suggest you take a look at MaybeUninit for how to do this sort of thing safely, and also why what you're doing here is UB @tustvold Thanks a lot for your suggestions. > I suggest you take a look at MaybeUninit for how to do this sort of thing safely The `Vec<u8>` will be used by the snnapy (de)compression, so I guess introducing `MaybeUninit` is not allowed if we try to change the signature of the buffer. > why what you're doing here is UB Actually, this is UB here for the memory between `buf[old_len..new_len]` is uninitialized, but the tests in the CI works fine. And I guess the reason for this is that the uninitialized content won't be read before writes (or initialization) in the (de)compression process. And I find something interesting in the [example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5321aaeda859ea9f664ae7952ade2fb6) mentioned at #6276, `vec!` macro also performs much better than `resize` (20x faster), so I guess there are some magic in the `vec!` macro. After digging into the std source code about `vec!` macro, I find a fast path for the initialization of `Vec<u8>` and `Vec<i8>`. And the two code snippets show the fast path: https://github.com/rust-lang/rust/blob/982c6f8721416431ec62bb0b9105c0578a9fc603/library/alloc/src/vec/mod.rs#L2752-L2754 ```rust pub fn from_elem<T: Clone>(elem: T, n: usize) -> Vec<T> { <T as SpecFromElem>::from_elem(elem, n, Global) } ``` https://github.com/rust-lang/rust/blob/982c6f8721416431ec62bb0b9105c0578a9fc603/library/alloc/src/vec/spec_from_elem.rs#L47-L60 ```rust impl SpecFromElem for u8 { #[inline] fn from_elem<A: Allocator>(elem: u8, n: usize, alloc: A) -> Vec<u8, A> { if elem == 0 { return Vec { buf: RawVec::with_capacity_zeroed_in(n, alloc), len: n }; } let mut v = Vec::with_capacity_in(n, alloc); unsafe { ptr::write_bytes(v.as_mut_ptr(), elem, n); v.set_len(n); } v } } ``` And a new [example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a407cda52e30f36bf730be2709c7bf5e) shows using the `ptr::write_bytes` + `Vec::set_len` performs much better than `resize` too. So how about using `ptr::write_bytes` to do initialization work before calling `Vec::set_len`? @tustvold Looking forward to your further suggestions! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
