Re: [PR] Minor: Document SIMD rationale and tips [arrow-rs]

via GitHub Sun, 13 Oct 2024 13:34:35 -0700


findepi commented on code in PR #6554:
URL: https://github.com/apache/arrow-rs/pull/6554#discussion_r1798554621



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.

Review Comment:
   "... on the compiler's ..." ?
   
   (in fact, vectorization **could** be applied on Rust MIR level, before LLVM?)



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.

Review Comment:
   was -> turned out ?



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair

Review Comment:
   stuterred "to"



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking

Review Comment:
   is "bitwise horizontal reductions" an obvious term?



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` 
`RUSTFLAGS` flag)
+
+The last point is especially important as the default `target-cpu` doesn't
+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with
+tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) 
and
+only once you've exhausted that avenue think of reaching for manual SIMD.
+Generally the hard part is getting the algorithm structured in such a way that
+it can be vectorized, regardless of what goes and generates those instructions.

Review Comment:
   maybe 
   
   ```suggestion
   it can be vectorized, regardless of what generates those instructions.
   ```



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex

Review Comment:
   extra whitespace before `,`



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` 
`RUSTFLAGS` flag)

Review Comment:
   Prefer passive voice. "SIMD instructions are enabled in the target ISA"



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` 
`RUSTFLAGS` flag)
+
+The last point is especially important as the default `target-cpu` doesn't
+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with
+tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) 
and

Review Comment:
   > again being sure to set `RUSTFLAGS`
   
   requires to set `RUSTFLAGS` properly



##########
arrow/CONTRIBUTING.md:
##########
@@ -109,6 +109,36 @@ specific JIRA issues and reference them in these code 
comments. For example:
 //      This is not sound because .... see 
https://issues.apache.org/jira/browse/ARROW-nnnnn
 ```
 
+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.
+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.
+
+[`std::simd`]: https://doc.rust-lang.org/std/simd/index.html
+
+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too 
complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` 
`RUSTFLAGS` flag)
+
+The last point is especially important as the default `target-cpu` doesn't
+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with

Review Comment:
   your code -> the code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Minor: Document SIMD rationale and tips [arrow-rs]

Reply via email to