Hi! I create test to reproduce issue with cpu2006/454.calculix See attached. File e_c3d.f contains cutted subroutine from calculix. tr535.f main entry point of the test. you can use go-script as a reference how i get these results. find_stall.pl script which find problem instruction combinations.
Problem that new compiler generates read instruction right after write. See some dumps below. This is inner cycle near line #42 generated by rev. 119759 compiler .L13: .LBB22: .loc 1 42 0 movapd %xmm2, %xmm0 leaq (%rdx,%rbx), %rax .loc 1 38 0 addl $1, %edi addq $24, %rdx .loc 1 42 0 mulsd 72(%rcx), %xmm0 .loc 1 38 0 addq $72, %rcx cmpl $4, %edi .loc 1 42 0 mulsd %xmm3, %xmm0 mulsd -8(%rax,%r9,8), %xmm0 mulsd %xmm4, %xmm0 addsd %xmm0, %xmm1 .loc 1 38 0 jne .L13 This is for line 42 generated by rev. 119760 compiler .L13: .LBB23: .loc 1 42 0 movsd 72(%rdx), %xmm0 movq 80(%rsp), %rax addq $72, %rdx mulsd -8(%r9,%r15,8), %xmm0 addq %rdi, %rax addq $24, %rdi .loc 1 38 0 cmpq $72, %rdi .loc 1 42 0 mulsd -8(%r11,%r14,8), %xmm0 mulsd -8(%rax,%r13,8), %xmm0 movq 440(%rsp), %rax mulsd (%rax), %xmm0 addsd (%rsi,%r10,8), %xmm0 <-| movsd %xmm0, (%rsi,%r10,8) <-+- problems .loc 1 38 0 jne .L13 My output is: real 0m3.781s user 0m3.776s sys 0m0.004s real 0m5.956s user 0m5.948s sys 0m0.004s hey... we are going hey... we are going Line 31 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Line 42 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Feel free to ask if any problems with reproducing occurs. -Vladimir ------ * From: Grigory Zagorodnev <grigory_zagorodnev at linux dot intel dot com> * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com * Cc: "H. J. Lu" <hjl at lucon dot org> * Date: Mon, 15 Jan 2007 17:59:31 +0300 * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix Hi! There is a huge regression of gcc 4.3 performance detected on cpu2006/454.calculix benchmark at -O2 optimization level on x86_64-redhat-linux. Regression is caused by mem-ssa merge 12/12/2006 (revision 119760). http://gcc.gnu.org/viewcvs?view=rev&revision=119760 PS: I'm trying to get a small reproducer - Grigory
test_calculix.tar.bz2
Description: BZip2 compressed data