------- Additional Comments From tbptbp at gmail dot com 2005-01-31 20:18 ------- -fno-gcse is a godsend, instant speedup and most of the sillyness when inlining is gone.
Now i've applied both your patches, and while there's promising they also triggers their own nastyness; gcc is so fond of memory inputs that it dumps stuff on the stack only to address them some instructions latter (well, that's my interpretation :). For example, 4010c3: 0f 28 6c 13 30 movaps 0x30(%ebx,%edx,1),%xmm5 4010c8: 0f 28 f9 movaps %xmm1,%xmm7 4010cb: 0f 28 cb movaps %xmm3,%xmm1 4010ce: 0f 29 6c 24 10 movaps %xmm5,0x10(%esp) 4010d3: 0f 59 ce mulps %xmm6,%xmm1 4010d6: 0f 59 c4 mulps %xmm4,%xmm0 4010d9: 0f 28 6c 16 30 movaps 0x30(%esi,%edx,1),%xmm5 4010de: 0f 59 5c 24 10 mulps 0x10(%esp),%xmm3 or 40119d: 0f c2 c1 01 cmpltps %xmm1,%xmm0 4011a1: 0f 29 04 24 movaps %xmm0,(%esp) 4011a5: 0f 28 c5 movaps %xmm5,%xmm0 4011a8: 0f c2 c1 01 cmpltps %xmm1,%xmm0 4011ac: 0f 28 c8 movaps %xmm0,%xmm1 4011af: 0f 56 0c 24 orps (%esp),%xmm1 Other than those quirks, it looks better to me. Just to be sure i've tried that patched version on my app, and it's slower than the unpatched version (both with -fno-gcse). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680