[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor

ubizjak at gmail dot com Thu, 28 Jun 2007 01:36:20 -0700


------- Comment #9 from ubizjak at gmail dot com  2007-06-28 08:36 -------
(In reply to comment #7)
> This is what I get without -ftree-vectorize, with -ftree-vectorize (default
> cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an
> AMD x86-64 (with trunk plus the patch posted by Dorit at
> http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt )
> 
> Case 1: (no vectorization)
> gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops
> pr32084.f90 -o 4.3.novect.out
> time ./4.3.novect.out
> real    0m4.414s
> user    0m4.312s
> sys     0m0.000s
> 
> Case 2: (vectorization without cost model)
> gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math
> -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o
> 4.3.nocost.out
> time ./4.3.nocost.out
> real    0m4.776s
> user    0m4.668s
> sys     0m0.004s
>
> In short, the 8% advantage that the scalar version has over the vector version
> disappears with the cost model.
> 
> Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1,
> 9) dont get vectorized (irrespective of the cost model).


No, it is OK (but for core2 and nocona -ftree-vectorize has 50% disadvantage
compared to scalar versions). The problem is that vectorized loop is not
unrolled anymore in the RTL unroller. My speculation is, that by unrolling the
vectorized loop, the runtimes of vectorized version will be _faster_ than
scalar versions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084

[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor

Reply via email to