Re: Can I access arrays faster than this?

cblake Wed, 06 Mar 2019 03:00:46 -0800

Either automatic or manual vectorization can also allow twice as many `float32` 
numbers to be handled per vector instruction vs `float64` on x86 just like the 
ARM or GPU cases. You may need `-march=native` or `-mavx` compiler flags (or 
manual intrinsics/assembly) to activate that feature, though, instead of 
targeting some lowest common denominator x86 cpu and C compiler 
autovectorization can be finicky.


It is true that for many calculations things are memory bandwidth bound where 
you still get 2x improvement. However, many are not membw bound or may be in 
fast caches. For those the 2x wider vectors help. (Funny - caches used to be 
almost entirely about latency but have become about both latency & bandwidth in 
recent times).

Obviously, the wrong answer faster is not helpful, but it often is close to 2x 
faster, depending on how vectorizable what you're doing is, compiler, and 
compiler flags (and/or manual assembly). Excess precision is also not helpful, 
if the cost is not minimal.

Re: Can I access arrays faster than this?

Reply via email to