https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120751

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Of course that we use an in-order reduction with two lanes, including build of
the data vector from scalars, to save 4 scalar multiplications, is a bit
on the border of profitability.  Per-stmt local costing is difficult here
though.

There's another PR about in-order reductions being somewhat pointless, but
in a case where we save nothing but loads (which in this case we don't).

Reply via email to