Re: SIMD implementation of dot-product. Benchmarks

Ilya Yaroshenko Sat, 17 Aug 2013 22:35:47 -0700

On Sunday, 18 August 2013 at 05:26:00 UTC, Manu wrote:

movups is not good. It'll be a lot faster (and portable) if youuse movaps.
Process looks something like:
* do the first few from a[0] until a's alignment interval asscalar
  * load the left of b's aligned pair
  * loop for each aligned vector in a
    - load a[n..n+4] aligned
    - load the right of b's pair
- combine left~right and shift left to match elementsagainst a
    - left = right
  * perform stragglers as scalar
Your benchmark is probably misleading too, because I suspectyou arepassing directly alloc-ed arrays into the function (which are16 byte
aligned).
movups will be significantly slower if the pointers suppliedare not 16
byte aligned.
Also, results vary significantly between chip manufacturers andrevisions.


I`ll try =). Thanks you very math!

Re: SIMD implementation of dot-product. Benchmarks

Reply via email to