jorgecarleitao edited a comment on pull request #9454: URL: https://github.com/apache/arrow/pull/9454#issuecomment-782800171
> I went to try and implement this, but it looks like it's already optimized as you suggest. `simd_` methods do: > > ```rust > let mut result = MutableBuffer::new(buffer_size).with_bitset(buffer_size, false); > ``` > > to prepare the return buffer, which uses `memory::allocate_aligned` internally, which does not initialize the newly-allocated memory region. Note that `with_bitset(buffer_size, false);` is a `malloc + memset`. So, we effectively do `malloc + memset (zeros) + memset (values)`. One easy win is to use `MutableBuffer::from_len_zeroed()`, which replaces `malloc + memset (zeros)` by `calloc` (which is lilkely faster as it is a single instruction, but LLVM may optimize this already). What I was thinking was using `MutableBuffer::with_capacity(buffer_size)` and introduce a method to write SIMD types (e.g. `i64x16`) directly to the MutableBuffer, allowing to convert all our SIMD kernels to `malloc + memset (values)`. This is not for this PR, though; was just a comment that we may be able to do more here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org