Don wrote:
dsimcha wrote:
== Quote from Don ([email protected])'s article
Jeremie Pelletier wrote:
While writing SSE assembly by hand in D is fun and works well, I'm
wondering
if the compiler has intrinsics for its instruction set, much like
xmmintrin.h in C.
The reason is that the compiler can usually reorder the intrinsics
to optimize
performance.
I could always use C code to implement my SSE routines but then I'd
lose the
ability to inline them in D.
I know this is an old post, but since it wasn't answered...
Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
I've read many complaints about how poorly they perform on all compilers
-- the penalty for allowing them to be reordered is that extra
instructions are often added, which means that straightforward C code is
sometimes faster!
In this regard, I'm personally excited about array operations. I think
the need for SSE intrinsics and vectorisation is a result of abstract
inversion: the instruction set is higher-level than the "high level
language"! Array operations allow D to catch up with asm again. When
array operations get implemented properly, it'll be interesting to see
how much need for SSE intrinsics remains.
What's wrong with the current implementation of array ops (other than
a few misc.
bugs that have already been filed)? I thought they already use SSE if
available.
(1) They don't take advantage of fixed-length arrays. In particular,
operations on float[4] should be a single SSE instruction (no function
call, no loop, nothing). This will make a huge difference to game and
graphics programmers, I believe.
(2) The operations don't block on cache size.
(3) DMD doesn't allow you to generate code assuming a minimum CPU
capabilities. (In fact, when generating inline asm, the CPU type is
8086! (this is in bugzilla)) This limits the possible use of (1).
It's issue (1) which is the killer.
I agree that a -arch switch of some sort would the best thing to hit
dmd. It is already most useful in gcc which supported up to core2 when I
last used it.
I wrote a linear algebra module with support for 2D,3D,4D vectors,
quaternions, 3x2 and 4x4 matrices, all with template structs so I can
declare them for float, double, or real components. I used SSE for the
bigger operations which grew up the module size considerably. This is
where I first started looking for SSE intrinsics. It would also be
greatly helpful if the compiler could generate SSE code by itself, it
would save a LOT of inline assembly for simple operations.