------- Comment #9 from ubizjak at gmail dot com 2007-06-28 08:36 ------- (In reply to comment #7) > This is what I get without -ftree-vectorize, with -ftree-vectorize (default > cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an > AMD x86-64 (with trunk plus the patch posted by Dorit at > http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt ) > > Case 1: (no vectorization) > gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops > pr32084.f90 -o 4.3.novect.out > time ./4.3.novect.out > real 0m4.414s > user 0m4.312s > sys 0m0.000s > > Case 2: (vectorization without cost model) > gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math > -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o > 4.3.nocost.out > time ./4.3.nocost.out > real 0m4.776s > user 0m4.668s > sys 0m0.004s > > In short, the 8% advantage that the scalar version has over the vector version > disappears with the cost model. > > Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1, > 9) dont get vectorized (irrespective of the cost model).
No, it is OK (but for core2 and nocona -ftree-vectorize has 50% disadvantage compared to scalar versions). The problem is that vectorized loop is not unrolled anymore in the RTL unroller. My speculation is, that by unrolling the vectorized loop, the runtimes of vectorized version will be _faster_ than scalar versions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084