http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499
--- Comment #5 from Ira Rosen <irar at il dot ibm.com> 2011-12-11 13:30:41 UTC --- (In reply to comment #4) > Looks like there has been some great progress in gcc 4.7! > > Still I think it behaves slightly buggy. > > (1) In this case it should work without -funsafe-math-optimizations but > it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math > -fassociative-math to make it work. > It's reduction, when we vectorize we change the order of computation. In order to be able to do that for floating point we need flag_associative_math. > (2) The prediction: > 7: not vectorized: vectorization not profitable. > is just wrong. Forcing it with -fno-vect-cost-model shows it speeds up > by factor of 2. > > (3) If I change all double's into float's in the code above it seems to > work without forcing it (-fno-vect-cost-model): > > > g++-4.7 -S -Wall -O2 -ftree-vectorize -ftree-vectorizer-verbose=2 \ > -funsafe-math-optimizations test.cpp > > Analyzing loop at test.cpp:7 > > > Vectorizing loop at test.cpp:7 > > 7: vectorizing stmts using SLP. > 7: LOOP VECTORIZED. > test.cpp:4: note: vectorized 1 loops in function. > > > However, it hasn't vectorized it at all as the assembly shows: > > .L11: > addq $1, %rax > addss %xmm0, %xmm3 > cmpq %rax, %rdi > addss %xmm0, %xmm4 > addss %xmm0, %xmm7 > addss %xmm0, %xmm6 > addss %xmm0, %xmm5 > addss %xmm0, %xmm1 > ja .L11 I think you are looking at the scalar epilogue. The number of iterations is unknown, so we need an epilogue loop for the case that number of iterations is not a multiple of 4.