On Mon, Aug 22, 2011 at 6:25 PM, Ilya Tocar <tocarip.in...@gmail.com> wrote:

>> You don't need to add "negated" versions, one FMA builtin per mode is
>> enough, please see existing FMA4 descriptions. Just put unary minus
>> sign in the intrinsics header for "negated" operand and let GCC do its
>> job. Please see existing FMA4 intrinsics header.
>>
> Actually i tried that.But in such case  when i compile(FMA4 example)
> #include <x86intrin.h>
> extern  __m128 a,b,c;
> void foo(){
>   a = _mm_nmsub_ps(a,b,c);
> }
> with -S -O0 -mfma4
> The asm have
>
>        vxorps  %xmm1, %xmm0, %xmm0
>        vmovaps -16(%rbp), %xmm1
>        vmovaps .LC0(%rip), %xmm2
>        vxorps  %xmm2, %xmm1, %xmm1
>        vfmaddps        %xmm0, -32(%rbp), %xmm1, %xmm0
> So vfmaddps of negated values is generated instead of vfnmsubps.
> I think it is bad that intrinsic for  instruction can generate code
> without this instruction.
> So to make sure that exact instruction is always generated i
> introduced additional expands and builtins.
> Is it wrong?

This is artificial limitation. User requested the functionality of the
intrinsic, and should not bother with how the compiler realizes it.
With -O2, negation would propagate into the insn during combine pass,
and optimal instruction would be generated.

So, to answer your question - it is wrong to expect exact instruction
from builtins. Maybe from using -O0, but this should not be used
anyway in the testsuite.

Uros.

Reply via email to