> which compiles to a single shufps instruction.
Doesn't it often require additional needless movaps instructions?
For example, the following:
asm
{
movaps XMM0, a;
movaps XMM1, b;
addps XMM0, XMM1;
movaps a, XMM0;
}
asm
{
movaps XMM0, a;
movaps XMM1, b;
addps XMM0, XMM1;
movaps a, XMM0;
}
compiles to
movaps -0x48(%rsp),%xmm0
movaps -0x38(%rsp),%xmm1
addps %xmm1,%xmm0
movaps %xmm0,-0x48(%rsp)
movaps -0x48(%rsp),%xmm0
movaps -0x38(%rsp),%xmm1
addps %xmm1,%xmm0
movaps %xmm0,-0x48(%rsp)
Is it possible to avoid needlless loading and storing of values when calling
multiple functions that use asm blocks? It also seems that the compiler doesn't
inline functions containing asm.