4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

lucier at math dot purdue dot edu Mon, 15 Jun 2009 13:21:27 -0700


------- Comment #103 from lucier at math dot purdue dot edu  2009-06-15 20:21 
-------
Regarding comment #101 ...


With

heine:~/programs/gcc/objdirs/gsc-fft-tests/gambc-v4_1_2>
/pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --prefix=/pkgs/gcc-mainline
--enable-languages=c --disable-multilib --enable-checking=release
Thread model: posix
gcc version 4.5.0 20090608 (experimental) [trunk revision 148276] (GCC) 

(and including Paolo's patch to speed up DF), the routine in direct.c takes

    168 ms cpu time (168 user, 0 system)

As reported here

http://www.math.purdue.edu/~lucier/bugzilla/9/

with gcc-4.2.4, this routine takes 156 ms on the same machine.

Comment #9 gives the code that 4.2.4 generates at the start of the main loop; 
the start of the main loop with the version of 4.5.0 I gave above is:

.L2938:
        movq    %rcx, %rdx
        addq    8(%rax), %rdx
        leaq    4(%rcx), %rbx
        movq    %rdx, -8(%rax)
        leaq    4(%rdx), %rdi
        addq    8(%rax), %rdx
        movq    %rdi, -16(%rax)
        movq    %rdx, -24(%rax)
        leaq    4(%rdx), %rdi
        addq    8(%rax), %rdx
        movq    %rdi, -32(%rax)
        movq    %rdx, -40(%rax)
        leaq    4(%rdx), %rdi
        movq    40(%rax), %rdx
        movq    %rdi, -48(%rax)
        movsd   7(%rdx,%rdi,2), %xmm7
        movq    -40(%rax), %rdi
        leaq    7(%rdx,%rcx,2), %r8
        addq    $8, %rcx
        movsd   (%r8), %xmm4
        cmpq    %rcx, %r13
        movsd   7(%rdx,%rdi,2), %xmm10
        movq    -32(%rax), %rdi
        movsd   7(%rdx,%rdi,2), %xmm5
        movq    -24(%rax), %rdi
        movsd   7(%rdx,%rdi,2), %xmm6
        movq    -16(%rax), %rdi
        movsd   7(%rdx,%rdi,2), %xmm13
        movq    -8(%rax), %rdi
        movsd   7(%rdx,%rdi,2), %xmm11
        leaq    (%rbx,%rbx), %rdi
        movsd   7(%rdi,%rdx), %xmm9
        movq    24(%rax), %rdx
        movapd  %xmm11, %xmm14
        movsd   15(%rdx), %xmm1
        movsd   7(%rdx), %xmm2
        movapd  %xmm1, %xmm8
        movsd   31(%rdx), %xmm3
        movapd  %xmm2, %xmm12
        mulsd   %xmm10, %xmm8
        mulsd   %xmm7, %xmm12
        mulsd   %xmm2, %xmm10
        mulsd   %xmm1, %xmm7
        movsd   23(%rdx), %xmm0

So, to my mind, this is still a 4.5 regression, as there is still a slow-down
and the code is still much less optimized by 4.5.0 than by 4.2.4. 168/156 ~
1.08, so if you want to change the Summary of this bug to 8% regression, or
some other things, that's fine, but I've changed this PR back to being a 4.5
regression.

I was not really thrilled when Richard marked PR 39157 as a duplicate of this
PR.  To my mind, there are three more or less independent things---run time of
Gambit-generated code, compile time of the code, and the space required to
compile the code.  This PR is about run time; PR 39157 was about space needed
by the compiler; PR 26854 is about compile time.  They seem to have all been
mushed together.


-- 

lucier at math dot purdue dot edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|4.5.0                       |
            Summary|[4.3/4.4 Regression] 30%    |[4.3/4.4/4.5 Regression] 30%
                   |performance slowdown in     |performance slowdown in
                   |floating-point code caused  |floating-point code caused
                   |by  r118475                 |by  r118475


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

Reply via email to