Re: Does dmd have SSE intrinsics?

Jeremie Pelletier Tue, 22 Sep 2009 11:20:30 -0700

Robert Jacques wrote:

On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier<[email protected]> wrote:
#ponce wrote:
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It'sstill slower if it's an unaligned access.
It all depends on how important you think performance on Core2 andearlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code wasslower than the FPU in certain places on my core2 quad, I now recallusing a lot of movups instructions, thanks for the tip.
 Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to useintrisics, or use libraries like Eigen (a SIMD vectorization librarybased on expression templates, which can generate SSE, AVX or FPUcode). But using such a library is often way too intrusive, andalignement is not in standard C++.D does already understand arrays operations like Eigen do, in orderto increase cacheability. It would be great if it could staticallydetect 16-byte aligned data and perform SSE when possible (thoughthere must be many others things to do :) ).
The D memory manager already aligns data on 16 bytes boundaries. Theonly case I can think of right now is when data is in a struct or class:
struct {
    float[4] vec; // aligned!
    int a;
    float[4] vec; // unaligned!
}
Yes, although classes have hidden vars, which are runtime dependent,changing the offset. Structs may be embedded in other things (thereforeoffset). And then there's the whole slicing from an array issue.

Ah yes, you are right. Then I guess it really is up to the programmer toknow if the data is aligned or not and select different code paths fromit. Adding checks at runtime just adds to the overhead we're trying tosave by using SSE in the first place.

It would be great if we could declare aliases to asm instructions anduse template functions with a (bool aligned = true) and set a movpsalias to either movaps or movups depending on the value of aligned.

Re: Does dmd have SSE intrinsics?

Reply via email to