Re: Does dmd have SSE intrinsics?

bearophile Mon, 21 Sep 2009 14:45:12 -0700

Don:
> (1) They don't take advantage of fixed-length arrays. In particular, 
> operations on float[4] should be a single SSE instruction (no function 
> call, no loop, nothing). This will make a huge difference to game and 
> graphics programmers, I believe.
[...]
>It's issue (1) which is the killer.


In my answer I have forgotten to say another small thing.

The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like 
to add a second argument to such GC malloc, to specify the alignment, this can 
be used to save some memory when the alignment isn't necessary), while I think 
the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes.

In the following code if you want to implement the last line with one vector 
instruction then a and b arrays have to be aligned to 16 bytes. I think that 
currently LDC doesn't align a and b to 16 bytes.

float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];

So you may need a syntax like the following, that's not handy:

align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];

A possible solution is to automatically align to 16 (by default, but it can be 
changed to save stack space in specific situations) all static arrays allocated 
on the stack too :-)
A note: in future probably CPU vector instructions will relax their alignment 
requirements... it's already happening.

Bye,
bearophile

Re: Does dmd have SSE intrinsics?

Reply via email to