On Mon, Aug 22, 2011 at 6:25 PM, Ilya Tocar <tocarip.in...@gmail.com> wrote:
>> You don't need to add "negated" versions, one FMA builtin per mode is >> enough, please see existing FMA4 descriptions. Just put unary minus >> sign in the intrinsics header for "negated" operand and let GCC do its >> job. Please see existing FMA4 intrinsics header. >> > Actually i tried that.But in such case when i compile(FMA4 example) > #include <x86intrin.h> > extern __m128 a,b,c; > void foo(){ > a = _mm_nmsub_ps(a,b,c); > } > with -S -O0 -mfma4 > The asm have > > vxorps %xmm1, %xmm0, %xmm0 > vmovaps -16(%rbp), %xmm1 > vmovaps .LC0(%rip), %xmm2 > vxorps %xmm2, %xmm1, %xmm1 > vfmaddps %xmm0, -32(%rbp), %xmm1, %xmm0 > So vfmaddps of negated values is generated instead of vfnmsubps. > I think it is bad that intrinsic for instruction can generate code > without this instruction. > So to make sure that exact instruction is always generated i > introduced additional expands and builtins. > Is it wrong? This is artificial limitation. User requested the functionality of the intrinsic, and should not bother with how the compiler realizes it. With -O2, negation would propagate into the insn during combine pass, and optimal instruction would be generated. So, to answer your question - it is wrong to expect exact instruction from builtins. Maybe from using -O0, but this should not be used anyway in the testsuite. Uros.