On 8/01/12 5:02 PM, Martin Nowak wrote:
simdop will need more overloads, e.g. some
instructions need immediate bytes.
z = simdop(SHUFPS, x, y, 0);

How about this:
__v128 simdop(T...)(SIMD op, T args);

These don't make a lot of sense to return as value, e.g.

__v128 a, b;
a = simdop(movhlps, b); // ???

movhlps moves the top 64-bits of b into the bottom 64-bits of a. Can't be done as an expression like this.

Would make more sense to just write the instructions like they appear in asm:

simdop(movhlps, a, b);
simdop(addps, a, b);
etc.

The difference between this and inline asm would be:

1. Registers are automatically allocated.
2. Loads/stores are inserted when we spill to stack.
3. Instructions can be scheduled and optimised by the compiler.

We could then extend this with user-defined types:

struct float4
{
  union
  {
     __v128 v;
     float[4] for_debugging;
  }

  float4 opBinary(string op:"+")(float4 rhs) @forceinline
  {
    __v128 result = v;
    simdop(addps, result, rhs);
    return float4(result);
  }
}

We'd need a strong guarantee of inlining and removal of redundant load/stores though for this to work well. We'd also need a guarantee that float4's would get the same treatment as __v128 (as it is the only element).

Reply via email to