------- Comment #103 from lucier at math dot purdue dot edu 2009-06-15 20:21 ------- Regarding comment #101 ...
With heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2> /pkgs/gcc-mainline/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline --enable-languages=c --disable-multilib --enable-checking=release Thread model: posix gcc version 4.5.0 20090608 (experimental) [trunk revision 148276] (GCC) (and including Paolo's patch to speed up DF), the routine in direct.c takes 168 ms cpu time (168 user, 0 system) As reported here http://www.math.purdue.edu/~lucier/bugzilla/9/ with gcc-4.2.4, this routine takes 156 ms on the same machine. Comment #9 gives the code that 4.2.4 generates at the start of the main loop; the start of the main loop with the version of 4.5.0 I gave above is: .L2938: movq %rcx, %rdx addq 8(%rax), %rdx leaq 4(%rcx), %rbx movq %rdx, -8(%rax) leaq 4(%rdx), %rdi addq 8(%rax), %rdx movq %rdi, -16(%rax) movq %rdx, -24(%rax) leaq 4(%rdx), %rdi addq 8(%rax), %rdx movq %rdi, -32(%rax) movq %rdx, -40(%rax) leaq 4(%rdx), %rdi movq 40(%rax), %rdx movq %rdi, -48(%rax) movsd 7(%rdx,%rdi,2), %xmm7 movq -40(%rax), %rdi leaq 7(%rdx,%rcx,2), %r8 addq $8, %rcx movsd (%r8), %xmm4 cmpq %rcx, %r13 movsd 7(%rdx,%rdi,2), %xmm10 movq -32(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm5 movq -24(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm6 movq -16(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm13 movq -8(%rax), %rdi movsd 7(%rdx,%rdi,2), %xmm11 leaq (%rbx,%rbx), %rdi movsd 7(%rdi,%rdx), %xmm9 movq 24(%rax), %rdx movapd %xmm11, %xmm14 movsd 15(%rdx), %xmm1 movsd 7(%rdx), %xmm2 movapd %xmm1, %xmm8 movsd 31(%rdx), %xmm3 movapd %xmm2, %xmm12 mulsd %xmm10, %xmm8 mulsd %xmm7, %xmm12 mulsd %xmm2, %xmm10 mulsd %xmm1, %xmm7 movsd 23(%rdx), %xmm0 So, to my mind, this is still a 4.5 regression, as there is still a slow-down and the code is still much less optimized by 4.5.0 than by 4.2.4. 168/156 ~ 1.08, so if you want to change the Summary of this bug to 8% regression, or some other things, that's fine, but I've changed this PR back to being a 4.5 regression. I was not really thrilled when Richard marked PR 39157 as a duplicate of this PR. To my mind, there are three more or less independent things---run time of Gambit-generated code, compile time of the code, and the space required to compile the code. This PR is about run time; PR 39157 was about space needed by the compiler; PR 26854 is about compile time. They seem to have all been mushed together. -- lucier at math dot purdue dot edu changed: What |Removed |Added ---------------------------------------------------------------------------- Known to work|4.5.0 | Summary|[4.3/4.4 Regression] 30% |[4.3/4.4/4.5 Regression] 30% |performance slowdown in |performance slowdown in |floating-point code caused |floating-point code caused |by r118475 |by r118475 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928