With gcc-4.6 -Ofast -funroll-all-loops -fno-tree-pre -mveclibabi=acml -m64
-march=amdfam10
sphnix3 runs 5% slower than with
gcc-4.6 -Ofast -funroll-all-loops -fno-prefetch-loop-arrays -fno-tree-pre
-mveclibabi=acml -m64 -march=amdfam10

prefetching will not cause any slowdown if the vectorizer is turned off, or
with -fno-fast-math.

I believe the related loops should be those with reductions that the following
commit enabled vectorization.
http://gcc.gnu.org/ml/gcc-cvs/2010-05/msg00277.html


-- 
           Summary: CPU2006 482.sphinx3: gcc4.6 5% regression from
                    prefetching of vectorized loop
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: changpeng dot fang at amd dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45391

Reply via email to