http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636



--- Comment #22 from Dominique d'Humieres <dominiq at lps dot ens.fr> 
2012-10-16 20:58:58 UTC ---

With the patch I see a ~10% slowdown in the Test4 - Lapack 2 (1001x1001) of

test_fpu.f90 compared to revision 192449



[macbook] lin/test% time /opt/gcc/gcc4.8c/bin/gfortran -fprotect-parens -Ofast

-funroll-loops test_lap.f90

6.742u 0.097s 0:06.87 99.4%    0+0k 0+20io 0pf+0w

[macbook] lin/test% a.out

  Benchmark running, hopefully as only ACTIVE task

Test4 - Lapack 2 (1001x1001) inverts  2.6 sec  Err= 0.000000000000250

                             total =  2.6 sec



[macbook] lin/test% time gfc -fprotect-parens -Ofast -funroll-all-loops

test_lap.f90

9.489u 0.116s 0:09.62 99.6%    0+0k 0+16io 0pf+0w

[macbook] lin/test% a.out

  Benchmark running, hopefully as only ACTIVE task

Test4 - Lapack 2 (1001x1001) inverts  2.8 sec  Err= 0.000000000000250

                             total =  2.8 sec



This looks similar to what I saw in comment #5. However now dgetri is never

inlined while dgetrf is inlined with the patch. Also dtrmv and dscal are

inlined with the patch (respectively 20 and 21 occurrences without the patch).

The last difference I see is 35 occurrences of dswap with the patch compared to

32 without.

Reply via email to