David Nadlinger Wrote:
> On 12/29/11 2:13 PM, a wrote:
> > void test(ref V a, ref V b)
> > {
> > asm
> > {
> > movaps XMM0, a;
> > addps XMM0, b;
> > movaps a, XMM0;
> > }
> > asm
> > {
> > movaps XMM0, a;
> > addps XMM0, b;
> > movaps a, XMM0;
> > }
> > }
> >
> > [â¦]
> >
> > The needles loads and stores would make it impossible to write an efficient
> > simd add function even if the functions containing asm blocks could be
> > inlined.
>
> Yes, this is indeed a problem, and as far as I'm aware, usually solved
> in the gamedev world by using the (SSE) intrinsics your favorite C++
> compiler provides, instead of resorting to inline asm.
>
> David
IIRC Walter doesn't want to add vector intrinsics, so it would be nice if the
functions to do vector operations could be efficiently written using inline
assembly. It would also be a more general solution than having intrinsics.
Something like that is possible with gcc extended inline assembly. For example
this:
typedef float v4sf __attribute__((vector_size(16)));
void vadd(v4sf *a, v4sf *b)
{
asm(
"addps %1, %0"
: "=x" (*a)
: "x" (*b), "0" (*a)
: );
}
void test(float * __restrict__ a, float * __restrict__ b)
{
v4sf * va = (v4sf*) a;
v4sf * vb = (v4sf*) b;
vadd(va,vb);
vadd(va,vb);
vadd(va,vb);
vadd(va,vb);
}
compiles to:
00000000004004c0 <test>:
4004c0: 0f 28 0e movaps (%rsi),%xmm1
4004c3: 0f 28 07 movaps (%rdi),%xmm0
4004c6: 0f 58 c1 addps %xmm1,%xmm0
4004c9: 0f 58 c1 addps %xmm1,%xmm0
4004cc: 0f 58 c1 addps %xmm1,%xmm0
4004cf: 0f 58 c1 addps %xmm1,%xmm0
4004d2: 0f 29 07 movaps %xmm0,(%rdi)
This should also be possible with GDC, but I couldn't figure out how to get
something like __restrict__ (if you want to use vector types and gcc extended
inline assembly with GDC, see
http://www.digitalmars.com/d/archives/D/gnu/Support_for_gcc_vector_attributes_SIMD_builtins_3778.html
and https://bitbucket.org/goshawk/gdc/wiki/UserDocumentation).