On Fri, 06 Jan 2012 14:44:53 +0100, Manu <[email protected]> wrote:

On 6 January 2012 14:56, Martin Nowak <[email protected]> wrote:

On Fri, 06 Jan 2012 09:43:30 +0100, Walter Bright <
[email protected]> wrote:

One caveat is it is typeless; a __v128 could be used as 4 packed ints or
2 packed doubles. One problem with making it typed is it'll add 10 more
types to the base compiler, instead of one. Maybe we should just bite the
bullet and do the types:

    __vdouble2
    __vfloat4
    __vlong2
    __vulong2
    __vint4
    __vuint4
    __vshort8
    __vushort8
    __vbyte16
    __vubyte16


Those could be typedefs, i.e. alias this wrapper.
Still simdop would not be typesafe.


I think they should by well defined structs with lots of type safety and
sensible methods. Not just a typedef of the typeless primitive.


As much as this proposal presents a viable solution,
why not spending the time to extend inline asm.


I think there are too many risky problems with the inline assembler (as
raised in my discussion about supporting pseudo registers in inline asm
blocks).
  * No way to allow the compiler to assign registers (pseudo registers)
That's what I propose he should do. IMHO it's a huge improvement when
register variables could be used directly in asm.

int a, b;
__vec128 c;

asm (a, b, c)
{
    mov EAX, a;
    add b, EAX;
    movps XMM1, c;
    mulps c, XMM1;
}

The compiler has enough knowledge to do this, and it's the common basic block spilling
scheme that is used here.

There is another benefit.
Consider the following:

__vec128 addps(__vec128 a, __vec128 b) pure
{
    __vec128 res = a;

    if (__ctfe)
    {
        foreach(i; 0 .. 4)
           res[i] += b[i];
    }
    else
    {
        asm (b, res)
        {
            addps res, b;
        }
    }
    return res;
}

  * Assembly blocks present problems for the optimiser, it's not reliable
that it can optimise around an inline asm blocks. How bad will it be when
trying to optimise around 100 small inlined functions each containing its
own inline asm blocks?
What do you mean by optimizing around? I don't see any apparent reason why that
should perform worse than using intrinsics.

The only implementation issue could be that lots of inlined asm snippets
make plenty basic blocks which could slow down certain compiler algorithms.

  * D's inline assembly syntax has to be carefully translated to GCC's
inline asm format when using GCC, and this needs to be done
PER-ARCHITECTURE, which Iain should not be expected to do for all the
obscure architectures GCC supports.

???
This would be needed for opcodes as well. You initial goal was to directly influence code gen up to instruction level, how should that be achieved without platform specific extension. Quite contrary with ops and asm he will need two hack paths into gcc's codegen.

What I see here is that we can do much good things to the inline
assembler while achieving the same goal.
With intrinsics on the other hand we're adding a very specialized
maintenance burden.

What would be needed?
 - Implement the asm allocation logic.
 - Functions containing asm statements should participate in inlining.
 - Determining inline cost of asm statements.


I raised these points in my other thread, these are all far more
complicated problems I think than exposing opcode intrinsics would be.
Opcode intrinsics are almost certainly the way to go.

When being used with typedefs for __vubyte16 et.al. this would
allow a really clean and simple library implementation of intrinsics.


The type safety you're imagining here might actually be annoying when
working with the raw type and opcodes..
Consider this common situation and the code that will be built around it:
__v128 vec = { floatX, floatY, floatZ, unsigned int packedColour ); //
Such is really not a good idea if the bit pattern of packedColour is a denormal.
How can you even execute a single useful command on the floats here?

Also mixing integer and FP instructions on the same register may
cause performance degradation. The registers are indeed typed CPU internally.

pack
some other useful data in W
If vec were strongly typed, I would now need to start casting all over the
place to use various float and uint opcodes on this value?
I think it's correct when using SIMD at the raw level to express the type
as it is, typeless... SIMD regs are infact typeless regs, they only gain
concept of type the moment you perform an opcode on it, and only for the
duration of that opcode.

You will get your strong type safety when you make use of the float4 types
which will be created in the libs.

Reply via email to