On Aug 25, 2005, at 12:47 PM, H. J. Lu wrote:

On Thu, Aug 25, 2005 at 12:37:32PM -0700, Ian Lance Taylor wrote:

Fariborz Jahanian <[EMAIL PROTECTED]> writes:


Forgot to attach the patch:

Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.795.4.33
diff -c -p -r1.795.4.33 i386.c
*** i386.c      15 Aug 2005 23:36:10 -0000      1.795.4.33
--- i386.c      25 Aug 2005 17:08:33 -0000
*************** ix86_rtx_costs (rtx x, int code, int out
*** 15730,15740 ****
         else
         switch (standard_80387_constant_p (x))
           {
!         case 1: /* 0.0 */
!           *total = 1;
!           break;
!         default: /* Other constants */
!           *total = 2;
             break;
           case 0:
           case -1:
--- 15730,15737 ----
         else
         switch (standard_80387_constant_p (x))
           {
!         default: /* All constants */
!           *total = 0;
             break;
           case 0:
           case -1:


For what it's worth, as I told Fariborz, I suspect that returning 0 is
correct for SFmode, but I'm somewhat doubtful for DFmode.  And his
test case is odd since the resulting code has more instructions and is
larger.  I know little about x86 instruction timings, but it seems
surprising that the new sequence is faster.  Maybe the problem is in
using %xmm0 instead of one of the 80387 registers--or, since this is
after all merely a constant--one of the general registers.

And in any case this type of thing should be controlled by an entry in
the i386 processor_costs structure.


I think the problem may be somewhere else. I got the same xmm0 code
sequence on Linux/ia32 with -msse3 -mfpmath=sse. However, I got

        xorl    %eax, %eax
        movq    %rax, 16(%rdi)
        movq    %rax, 8(%rdi)
        movq    %rax, (%rdi)

Can you try this with -march=pentium4

- fariborz


on Linux/x86-64.


H.J.


Reply via email to