http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499

--- Comment #5 from Ira Rosen <irar at il dot ibm.com> 2011-12-11 13:30:41 UTC 
---
(In reply to comment #4)
> Looks like there has been some great progress in gcc 4.7!
> 
> Still I think it behaves slightly buggy.
> 
> (1) In this case it should work without -funsafe-math-optimizations but
>     it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math
>    -fassociative-math to make it work.
> 

It's reduction, when we vectorize we change the order of computation. In order
to be able to do that for floating point we need flag_associative_math.

> (2) The prediction:
>        7: not vectorized: vectorization not profitable.
>     is just wrong. Forcing it with -fno-vect-cost-model shows it speeds up
>     by factor of 2.
> 
> (3) If I change all double's into float's in the code above it seems to
>     work without forcing it (-fno-vect-cost-model):
> 
> 
>    g++-4.7 -S -Wall -O2  -ftree-vectorize -ftree-vectorizer-verbose=2 \
>            -funsafe-math-optimizations test.cpp
> 
>    Analyzing loop at test.cpp:7
> 
> 
>    Vectorizing loop at test.cpp:7
> 
>    7: vectorizing stmts using SLP.
>    7: LOOP VECTORIZED.
>    test.cpp:4: note: vectorized 1 loops in function.
> 
> 
>     However, it hasn't vectorized it at all as the assembly shows:
> 
> .L11:
>     addq    $1, %rax
>     addss    %xmm0, %xmm3
>     cmpq    %rax, %rdi
>     addss    %xmm0, %xmm4
>     addss    %xmm0, %xmm7
>     addss    %xmm0, %xmm6
>     addss    %xmm0, %xmm5
>     addss    %xmm0, %xmm1
>     ja    .L11


I think you are looking at the scalar epilogue. The number of iterations is
unknown, so we need an epilogue loop for the case that number of iterations is
not a multiple of 4.

Reply via email to