This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git
The following commit(s) were added to refs/heads/master by this push:
new 9485897cc Minor: Document SIMD rationale and tips (#6554)
9485897cc is described below
commit 9485897ccb6da955a3efeba84e552e85d4efaa20
Author: Andrew Lamb <[email protected]>
AuthorDate: Thu Oct 17 06:45:37 2024 -0400
Minor: Document SIMD rationale and tips (#6554)
* Minor: Document SIMD rationale and tips
* Apply suggestions from code review
Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
* More review feedback
* tweak
* Update arrow/CONTRIBUTING.md
* Update arrow/CONTRIBUTING.md
* clarify inlining more
* formating
---------
Co-authored-by: Ed Seidl <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
---
arrow/CONTRIBUTING.md | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/arrow/CONTRIBUTING.md b/arrow/CONTRIBUTING.md
index 0c795d6b9..a9a9426a4 100644
--- a/arrow/CONTRIBUTING.md
+++ b/arrow/CONTRIBUTING.md
@@ -109,6 +109,42 @@ specific JIRA issues and reference them in these code
comments. For example:
// This is not sound because .... see
https://issues.apache.org/jira/browse/ARROW-nnnnn
```
+### Usage of SIMD / auto vectorization
+
+This crate does not use SIMD intrinsics (e.g. [`std::simd`]) directly, but
+instead relies on the Rust compiler's auto-vectorization capabilities, which
are
+built on LLVM.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces kernels that are
+faster than using hand-written SIMD intrinsics. This crate used to contain
+several kernels with hand-written SIMD instructions, which were removed after
+discovering the auto-vectorized code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+#### Tips for auto vectorization
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body (e.g no checking for nulls on each row)
+2. Not too much inlining (judicious use of `#[inline]` and `#[inline(never)]`)
as the vectorizer gives up if the code is too complex
+3. No [horizontal reductions] or data dependencies
+4. Suitable SIMD instructions available in the target ISA (e.g. `target-cpu`
`RUSTFLAGS` flag)
+
+[horizontal reductions]:
https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html
+
+The last point is especially important as the default `target-cpu` doesn't
+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend using tools like
+<https://rust.godbolt.org/> (again being sure `RUSTFLAGS` is set appropriately)
+to analyze the resulting code, and only once you've exhausted auto
vectorization
+think of reaching for manual SIMD. Generally the hard part of vectorizing code
+is structuring the algorithm in such a way that it can be vectorized,
regardless
+of what generates those instructions.
+
# Releases and publishing to crates.io
Please see the [release](../dev/release/README.md) for details on how to
create arrow releases