https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120398
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |target Target Milestone|--- |15.2 Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- t.c:8:12: note: vect_is_simple_use: operand r1_25 = PHI <_11(7), 0.0(6)>, type of def: reduction t.c:8:12: missed: reduc op not supported by target. t.c:8:12: missed: in-order unchained SLP reductions not supported. so this is a special-case we do not support optimally. Note that in-order reductions, esp with the high VF resulting, are necessarily "bad", but costing still thinks it's profitable: t.c:8:12: note: Cost model analysis: Vector inside of loop cost: 376 Vector prologue cost: 0 Vector epilogue cost: 336 Scalar iteration cost: 80 Scalar outside cost: 32 Vector outside cost: 336 using -ffast-math you get nice code. aarch64 doesn't vectorize the loop at -O2 for me unless enabling SVE which has in-order reductions. IMO a costing issue only, if it is slower than not vectorizing (didn't try to actually measure).