https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |aarch64
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, I think count is handled correctly even for SLP.  Given we accumulate
'short' to 'double' we likely perform 'count' adds to the m's here and those
are chained in a simple way.  We specifically avoid creating more
reduction variables because of register pressure issues with and without SLP
if possible.  Note when you have for example three scalar reductions we will
up the number of IVs to use with SLP, so using 'count' isn't always 100%
accurate but it the case of the testcase it should be.

But I'm not sure what "reduction-latency" tries to measure.

Reply via email to