jorgecarleitao opened a new pull request #9235: URL: https://github.com/apache/arrow/pull/9235
# Rational Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it. Currently, all our initializations are zeroed, which is expensive. #9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions. # This PR This PR is built on top of #9076 and introduces methods to extend a `MutableBuffer` from an iterator (and an `ExactSizedIterator` when it is possible) to build a `MutableBuffer`, thereby offering a `safe` API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like). The design is heavily inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization), and thus have to expose a bit more methods than what `Vec` exposes. Also, unfortunately Rust does not support `collect()` for `ExactSizedIterator` and `TrustedLen` is `unstable`, which means that we can't use that (nicer) API for sized iterators based on `collect()`. The first commit is just a fix to a bench, that was taking the creation of an array into account. The second commit is the most important and contains the new APIs. The last 2 commits are examples of what this API looks like and what it can achieve (benches below). PS: using `ExactSizedIterator` is 2x faster than the `Iterator`. I have been fighting the compiler to try to have the same performance in both (as it is only a branch on the if), but the compiler is not being very friendly to me (related to https://github.com/rust-lang/rust/issues/32155). ```bash git checkout master cargo bench --bench arithmetic_kernels git checkout length_faster cargo bench --bench arithmetic_kernels git checkout 557e728b201ccb301b05e0cb1470782d37c6994c cargo bench --bench length_kernel git checkout length_faster ``` ``` Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow) Finished bench [optimized] target(s) in 1m 00s Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83 Gnuplot not found, using plotters backend add 512 time: [509.72 ns 510.21 ns 510.69 ns] change: [-24.729% -24.227% -23.740%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe subtract 512 time: [498.20 ns 499.79 ns 501.36 ns] change: [-25.168% -24.543% -23.948%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe multiply 512 time: [498.28 ns 501.10 ns 504.14 ns] change: [-32.237% -29.733% -27.551%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild divide 512 time: [1.8751 us 1.8771 us 1.8790 us] change: [-21.101% -20.410% -19.729%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe limit 512, 512 time: [360.62 ns 362.85 ns 365.08 ns] change: [-4.0917% -2.8282% -1.6589%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild add_nulls_512 time: [523.34 ns 525.34 ns 527.35 ns] change: [-19.810% -19.242% -18.654%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe divide_nulls_512 time: [1.8594 us 1.8606 us 1.8617 us] change: [-21.900% -21.444% -20.974%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe ``` Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master): ``` length time: [1.5379 us 1.5408 us 1.5437 us] change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
