On Sat, 07 Jan 2012 01:06:21 +0100, Walter Bright <[email protected]> wrote:

On 1/6/2012 1:43 PM, Manu wrote:
There is actually. To the compiler, the intrinsic is a normal function, with some hook in the code generator to produce the appropriate opcode when it's
performing actual code generation.
On most compilers, the inline asm on the other hand, is unknown to the compiler, the optimiser can't do much anymore, because it doesn't know what the inline asm has done, and the code generator just goes and pastes your asm code inline where you told it to. It doesn't know if you've written to aliased variables, called functions, etc.. it can no longer safely rearrange code around the inline asm
block.. which means it's not free to pipeline the code efficiently.

And, in fact, the compiler should not try to optimize inline assembler. The IA is there so that the programmer can hand tweak things without the compiler defeating his attempts.

For example, suppose the compiler schedules instructions for processor X. The programmer writes inline asm to schedule for Y, because the compiler doesn't specifically support Y. The compiler goes ahead and reschedules it for X.

Arggh!

Yes, but that's not what I meant.

Consider

__v128 a = load(1), b = loadB(2);
__v128 c = add(a, b);
__v128 d = add(a, b);

A valid optimization could be.

__v128 b = load(2);
__v128 a = load(1);
__v128 tmp = add(a, b);
__v128 d = tmp;
__v128 c = tmp;

__v128 load(int v) pure
{
    __v128 res;
    asm (res, v)
    {
        MOVD res, v;
        SHUF res, 0x0000;
    }
    return res;
}

__v128 add(__v128 a, __v128 b) pure
{
    __v128 res = a;
    asm (res, b)
    {
        ADD res, b;
    }
    return res;
}

The compiler might drop evaluation of
d and just use the comsub of c.
He might also evaluate d before c.
The important point is to mark those functions as having no-sideeffect,
which can be checked if instructions are classified.
Thus the compiler can do all kind of optimizations on expression level.

After inlining it would look like this.

__v128 b;
asm (b) { MOV b, 2; }
__v128 a;
asm (a) { MOV a, 1; }
__v128 tmp;
asm (a, b, tmp) { MOV tmp, a; ADD tmp, b; }
__v128 c;
asm (c, tmp) { MOV c, tmp; }
__v128 d;
asm (d, tmp) { MOV d, tmp; }

Then he will do the usual register assignment except that
variables must be assigned a register for asm blocks they
are used in.

This is effectively achieves the same as writing this with intrinsics.
It also greatly improves the composition of inline asm.


What dmd does do with the inline assembler is it keeps track of which registers are read/written, so that effective register allocation can be done for the non-asm code.

Which is why the compiler should be the one to allocate pseudo-registers.

Reply via email to