Don: > (1) They don't take advantage of fixed-length arrays. In particular, > operations on float[4] should be a single SSE instruction (no function > call, no loop, nothing). This will make a huge difference to game and > graphics programmers, I believe. [...] >It's issue (1) which is the killer.
In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
