Re: SIMD support...

Martin Nowak Fri, 06 Jan 2012 19:27:56 -0800

On Sat, 07 Jan 2012 01:06:21 +0100, Walter Bright<[email protected]> wrote:

On 1/6/2012 1:43 PM, Manu wrote:
There is actually. To the compiler, the intrinsic is a normal function,withsome hook in the code generator to produce the appropriate opcode whenit's
performing actual code generation.
On most compilers, the inline asm on the other hand, is unknown to thecompiler,the optimiser can't do much anymore, because it doesn't know what theinline asmhas done, and the code generator just goes and pastes your asm codeinline whereyou told it to. It doesn't know if you've written to aliased variables,calledfunctions, etc.. it can no longer safely rearrange code around theinline asm
block.. which means it's not free to pipeline the code efficiently.
And, in fact, the compiler should not try to optimize inline assembler.The IA is there so that the programmer can hand tweak things without thecompiler defeating his attempts.
For example, suppose the compiler schedules instructions for processorX. The programmer writes inline asm to schedule for Y, because thecompiler doesn't specifically support Y. The compiler goes ahead andreschedules it for X.
Arggh!


Yes, but that's not what I meant.

Consider

__v128 a = load(1), b = loadB(2);
__v128 c = add(a, b);
__v128 d = add(a, b);

A valid optimization could be.

__v128 b = load(2);
__v128 a = load(1);
__v128 tmp = add(a, b);
__v128 d = tmp;
__v128 c = tmp;

__v128 load(int v) pure
{
    __v128 res;
    asm (res, v)
    {
        MOVD res, v;
        SHUF res, 0x0000;
    }
    return res;
}

__v128 add(__v128 a, __v128 b) pure
{
    __v128 res = a;
    asm (res, b)
    {
        ADD res, b;
    }
    return res;
}

The compiler might drop evaluation of
d and just use the comsub of c.
He might also evaluate d before c.
The important point is to mark those functions as having no-sideeffect,
which can be checked if instructions are classified.
Thus the compiler can do all kind of optimizations on expression level.

After inlining it would look like this.

__v128 b;
asm (b) { MOV b, 2; }
__v128 a;
asm (a) { MOV a, 1; }
__v128 tmp;
asm (a, b, tmp) { MOV tmp, a; ADD tmp, b; }
__v128 c;
asm (c, tmp) { MOV c, tmp; }
__v128 d;
asm (d, tmp) { MOV d, tmp; }

Then he will do the usual register assignment except that
variables must be assigned a register for asm blocks they
are used in.

This is effectively achieves the same as writing this with intrinsics.
It also greatly improves the composition of inline asm.

What dmd does do with the inline assembler is it keeps track of whichregisters are read/written, so that effective register allocation can bedone for the non-asm code.


Which is why the compiler should be the one to allocate pseudo-registers.

Re: SIMD support...

Reply via email to