------- Comment #9 from ubizjak at gmail dot com  2007-06-28 08:36 -------
(In reply to comment #7)
> This is what I get without -ftree-vectorize, with -ftree-vectorize (default
> cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an
> AMD x86-64 (with trunk plus the patch posted by Dorit at
> http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt )
> 
> Case 1: (no vectorization)
> gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops
> pr32084.f90 -o 4.3.novect.out
> time ./4.3.novect.out
> real    0m4.414s
> user    0m4.312s
> sys     0m0.000s
> 
> Case 2: (vectorization without cost model)
> gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math
> -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o
> 4.3.nocost.out
> time ./4.3.nocost.out
> real    0m4.776s
> user    0m4.668s
> sys     0m0.004s
>
> In short, the 8% advantage that the scalar version has over the vector version
> disappears with the cost model.
> 
> Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1,
> 9) don’t get vectorized (irrespective of the cost model).

No, it is OK (but for core2 and nocona -ftree-vectorize has 50% disadvantage
compared to scalar versions). The problem is that vectorized loop is not
unrolled anymore in the RTL unroller. My speculation is, that by unrolling the
vectorized loop, the runtimes of vectorized version will be _faster_ than
scalar versions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084

Reply via email to