auto vectorization notes

Bruce Carneal via Digitalmars-d-learn Mon, 23 Mar 2020 11:56:01 -0700

When speeds are equivalent, or very close, I usually prefer autovectorized code to explicit SIMD/__vector code as it's easier toread. (on the downside you have to guard against compilercode-gen performance regressions)

One oddity I've noticed is that I sometimes need to usepragma(inline, *false*) in order to get ldc to "do the rightthing". Apparently the compiler sees the costs/benefitsdifferently in the standalone context.

More widely known techniques that have gotten people over theserial/SIMD hump include:

 1) simplified indexing relationships
 2) known count inner loops (chunkify)

3) static foreach blocks (manual inlining that the compiler"gathers")

I'd be interested to hear from others regarding their autovectorization and __vector experiences. What has worked and whathasn't worked in your performance sensitive dlang code?

auto vectorization notes

Reply via email to