------- Comment #9 from dominiq at lps dot ens dot fr 2010-08-24 11:47 ------- > Do you see the slowdown as well if you drop -funroll-loops?
Yes [macbook] lin/test% gfc -Ofast test_fpu_red.f90 [macbook] lin/test% time a.out Test1 - Gauss 2000 (101x101) inverts 3.0 sec Err= 0.000000000000006 3.208u 0.072s 0:03.28 99.6% 0+0k 0+0io 0pf+0w [macbook] lin/test% gfcp -Ofast test_fpu_red.f90 [macbook] lin/test% time a.out Test1 - Gauss 2000 (101x101) inverts 2.2 sec Err= 0.000000000000006 2.440u 0.076s 0:02.52 99.6% 0+0k 0+0io 0pf+0w > Do you see the slowdown with just -O2? No [macbook] lin/test% gfc -O2 test_fpu_red.f90 [macbook] lin/test% time a.out Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000006 3.328u 0.071s 0:03.40 99.7% 0+0k 0+0io 0pf+0w [macbook] lin/test% gfcp -O2 test_fpu_red.f90 [macbook] lin/test% time a.out Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000006 3.330u 0.073s 0:03.40 100.0% 0+0k 0+0io 0pf+0w but I see it with -O2 -ftree-vectorize [macbook] lin/test% gfc -O2 -ftree-vectorize test_fpu_red.f90 [macbook] lin/test% time a.out Test1 - Gauss 2000 (101x101) inverts 3.1 sec Err= 0.000000000000006 3.318u 0.070s 0:03.39 99.7% 0+0k 0+0io 0pf+0w [macbook] lin/test% gfcp -O2 -ftree-vectorize test_fpu_red.f90 [macbook] lin/test% time a.out Test1 - Gauss 2000 (101x101) inverts 2.3 sec Err= 0.000000000000006 2.498u 0.076s 0:02.57 99.6% 0+0k 0+0io 0pf+0w although I do not see any difference in the outputs with -ftree-vectorizer-verbose=2. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45379