On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier <[email protected]>
wrote:
bearophile wrote:
Don:
(1) They don't take advantage of fixed-length arrays. In particular,
operations on float[4] should be a single SSE instruction (no function
call, no loop, nothing). This will make a huge difference to game and
graphics programmers, I believe.
[...]
It's issue (1) which is the killer.
In my answer I have forgotten to say another small thing.
The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I
may like to add a second argument to such GC malloc, to specify the
alignment, this can be used to save some memory when the alignment
isn't necessary), while I think the std.c.stdlib.malloc() doesn't give
pointers aligned to 16 bytes.
In the following code if you want to implement the last line with one
vector instruction then a and b arrays have to be aligned to 16 bytes.
I think that currently LDC doesn't align a and b to 16 bytes.
float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];
So you may need a syntax like the following, that's not handy:
align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];
A possible solution is to automatically align to 16 (by default, but
it can be changed to save stack space in specific situations) all
static arrays allocated on the stack too :-)
A note: in future probably CPU vector instructions will relax their
alignment requirements... it's already happening.
Bye,
bearophile
That 16bytes alignment is a restriction of the current usage of bit
fields. Since every bit in the field indexes a single 16bytes block, a
simple shift 4 bits to the right translate a pointer into its index in
the bit field. You could align on 4 bytes boundaries but at the cost of
doubling the size of bit fields, and possibly having slower collection
runs.
Doesn't SSE have aligned and unaligned versions of its move
instructions? like MOVAPS and MOVUPS.
Yes, but the unaligned version is slower, even for aligned data.
Also, another issue for game/graphic/robotic programmers is the ability to
return fixed length arrays from functions. Though struct wrappers
mitigates this.