https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120398

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |target
   Target Milestone|---                         |15.2
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
t.c:8:12: note:   vect_is_simple_use: operand r1_25 = PHI <_11(7), 0.0(6)>,
type of def: reduction
t.c:8:12: missed:   reduc op not supported by target.
t.c:8:12: missed:   in-order unchained SLP reductions not supported.

so this is a special-case we do not support optimally.  Note that in-order
reductions, esp with the high VF resulting, are necessarily "bad", but
costing still thinks it's profitable:

t.c:8:12: note:  Cost model analysis:
  Vector inside of loop cost: 376
  Vector prologue cost: 0 
  Vector epilogue cost: 336
  Scalar iteration cost: 80
  Scalar outside cost: 32
  Vector outside cost: 336

using -ffast-math you get nice code.  aarch64 doesn't vectorize the loop
at -O2 for me unless enabling SVE which has in-order reductions.

IMO a costing issue only, if it is slower than not vectorizing (didn't try to
actually measure).

Reply via email to