On Sat, 07 Jan 2012 01:06:21 +0100, Walter Bright
<[email protected]> wrote:
On 1/6/2012 1:43 PM, Manu wrote:
There is actually. To the compiler, the intrinsic is a normal function,
with
some hook in the code generator to produce the appropriate opcode when
it's
performing actual code generation.
On most compilers, the inline asm on the other hand, is unknown to the
compiler,
the optimiser can't do much anymore, because it doesn't know what the
inline asm
has done, and the code generator just goes and pastes your asm code
inline where
you told it to. It doesn't know if you've written to aliased variables,
called
functions, etc.. it can no longer safely rearrange code around the
inline asm
block.. which means it's not free to pipeline the code efficiently.
And, in fact, the compiler should not try to optimize inline assembler.
The IA is there so that the programmer can hand tweak things without the
compiler defeating his attempts.
For example, suppose the compiler schedules instructions for processor
X. The programmer writes inline asm to schedule for Y, because the
compiler doesn't specifically support Y. The compiler goes ahead and
reschedules it for X.
Arggh!
Yes, but that's not what I meant.
Consider
__v128 a = load(1), b = loadB(2);
__v128 c = add(a, b);
__v128 d = add(a, b);
A valid optimization could be.
__v128 b = load(2);
__v128 a = load(1);
__v128 tmp = add(a, b);
__v128 d = tmp;
__v128 c = tmp;
__v128 load(int v) pure
{
__v128 res;
asm (res, v)
{
MOVD res, v;
SHUF res, 0x0000;
}
return res;
}
__v128 add(__v128 a, __v128 b) pure
{
__v128 res = a;
asm (res, b)
{
ADD res, b;
}
return res;
}
The compiler might drop evaluation of
d and just use the comsub of c.
He might also evaluate d before c.
The important point is to mark those functions as having no-sideeffect,
which can be checked if instructions are classified.
Thus the compiler can do all kind of optimizations on expression level.
After inlining it would look like this.
__v128 b;
asm (b) { MOV b, 2; }
__v128 a;
asm (a) { MOV a, 1; }
__v128 tmp;
asm (a, b, tmp) { MOV tmp, a; ADD tmp, b; }
__v128 c;
asm (c, tmp) { MOV c, tmp; }
__v128 d;
asm (d, tmp) { MOV d, tmp; }
Then he will do the usual register assignment except that
variables must be assigned a register for asm blocks they
are used in.
This is effectively achieves the same as writing this with intrinsics.
It also greatly improves the composition of inline asm.
What dmd does do with the inline assembler is it keeps track of which
registers are read/written, so that effective register allocation can be
done for the non-asm code.
Which is why the compiler should be the one to allocate pseudo-registers.