On 9/22/11 1:39 AM, Don wrote:
On 22.09.2011 05:24, a wrote:
How would one do something like this without intrinsics (the code is
c++ using
gcc vector extensions):

[snip]
At present, you can't do it without ultimately resorting to inline asm.
But, what we've done is to move SIMD into the machine model: the D
machine model assumes that float[4] + float[4] is a more efficient
operation than a loop.
Currently, only arithmetic operations are implemented, and on DMD at
least, they're still not proper intrinsics. So in the long term it'll be
possible to do it directly, but not yet.

At various times, several of us have implemented 'swizzle' using CTFE,
giving you a syntax like:

float[4] x, y;
x[] = y[].swizzle!"cdcd"();
// x[0]=y[2], x[1]=y[3], x[2]=y[2], x[3]=y[3]

which compiles to a single shufps instruction.

That "cdcd" string is really a tiny DSL: the language consists of four
characters, each of which is a, b, c, or d.

I think we should put swizzle in std.numeric once and for all. Is anyone interested in taking up that task?

A couple of years ago I made a DSL compiler for BLAS1 operations. It was
capable of doing some pretty wild stuff, even then. (The DSL looked like
normal D code).
But the compiler has improved enormously since that time. It's now
perfectly feasible to make a DSL for the SIMD operations you need.

The really nice thing about this, compared to normal asm, is that you
have access to the compiler's symbol table. This lets you add
compile-time error messages, for example.

A funny thing about this, which I found after working on the DMD
back-end, is that is MUCH easier to write an optimizer/code generator in
a DSL in D, than in a compiler back-end.

A good argument for (a) moving stuff from the compiler into the library, (b) continuing Don's great work on making CTFE a solid proposition.


Andrei

Reply via email to