Re: SIMD implementation of dot-product. Benchmarks

Ilya Yaroshenko Sat, 17 Aug 2013 21:40:46 -0700

On Sunday, 18 August 2013 at 01:53:53 UTC, Manu wrote:

It doesn't look like you account for alignment.
This is basically not-portable (I doubt unaligned loads in thiscontext arefaster than performing scalar operations), and possiblyinefficient on x86
too.

dotProduct uses unaligned loads (__builtin_ia32_loadups256,__builtin_ia32_loadupd256) and it up to 21 times faster thentrivial scalar version.


Why unaligned loads is not-portable and inefficient?

To make it account for potentially random alignment will beawkward, but it
might be possible to do efficiently.

Did you mean use unaligned loads or prepare data for alignmentloads at the beginning of function?

Re: SIMD implementation of dot-product. Benchmarks

Reply via email to