Hi!
I create test to reproduce issue with cpu2006/454.calculix
See attached. File e_c3d.f contains cutted subroutine from calculix.
tr535.f main entry point of the test. you can use go-script as a
reference how i get these results. find_stall.pl script which find
problem instruction combinations.

Problem that new compiler generates read instruction right after
write. See some dumps below.

This is inner cycle near line #42 generated by rev. 119759 compiler
.L13:
.LBB22:
        .loc 1 42 0
        movapd  %xmm2, %xmm0
        leaq    (%rdx,%rbx), %rax
        .loc 1 38 0
        addl    $1, %edi
        addq    $24, %rdx
        .loc 1 42 0
        mulsd   72(%rcx), %xmm0
        .loc 1 38 0
        addq    $72, %rcx
        cmpl    $4, %edi
        .loc 1 42 0
        mulsd   %xmm3, %xmm0
        mulsd   -8(%rax,%r9,8), %xmm0
        mulsd   %xmm4, %xmm0
        addsd   %xmm0, %xmm1
        .loc 1 38 0
        jne     .L13
        
This is for line 42 generated by rev. 119760 compiler
.L13:
.LBB23:
        .loc 1 42 0
        movsd   72(%rdx), %xmm0
        movq    80(%rsp), %rax
        addq    $72, %rdx
        mulsd   -8(%r9,%r15,8), %xmm0
        addq    %rdi, %rax
        addq    $24, %rdi
        .loc 1 38 0
        cmpq    $72, %rdi
        .loc 1 42 0
        mulsd   -8(%r11,%r14,8), %xmm0
        mulsd   -8(%rax,%r13,8), %xmm0
        movq    440(%rsp), %rax
        mulsd   (%rax), %xmm0
        addsd   (%rsi,%r10,8), %xmm0     <-|
        movsd   %xmm0, (%rsi,%r10,8)    <-+- problems
        .loc 1 38 0
        jne     .L13



My output is:
real    0m3.781s
user    0m3.776s
sys     0m0.004s

real    0m5.956s
user    0m5.948s
sys     0m0.004s
hey... we are going
hey... we are going
Line 31
       addsd   (%rsi,%r10,8), %xmm0
       movsd   %xmm0, (%rsi,%r10,8)

Line 42
       addsd   (%rsi,%r10,8), %xmm0
       movsd   %xmm0, (%rsi,%r10,8)

Feel free to ask if any problems with reproducing occurs.

-Vladimir


------
   * From: Grigory Zagorodnev <grigory_zagorodnev at linux dot intel dot com>
   * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com
   * Cc: "H. J. Lu" <hjl at lucon dot org>
   * Date: Mon, 15 Jan 2007 17:59:31 +0300
   * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix

Hi!
There is a huge regression of gcc 4.3 performance detected on
cpu2006/454.calculix benchmark at -O2 optimization level on
x86_64-redhat-linux.

Regression is caused by mem-ssa merge 12/12/2006 (revision 119760).
http://gcc.gnu.org/viewcvs?view=rev&revision=119760


PS: I'm trying to get a small reproducer
- Grigory

Attachment: test_calculix.tar.bz2
Description: BZip2 compressed data

Reply via email to