https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |aarch64 Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Well, I think count is handled correctly even for SLP. Given we accumulate 'short' to 'double' we likely perform 'count' adds to the m's here and those are chained in a simple way. We specifically avoid creating more reduction variables because of register pressure issues with and without SLP if possible. Note when you have for example three scalar reductions we will up the number of IVs to use with SLP, so using 'count' isn't always 100% accurate but it the case of the testcase it should be. But I'm not sure what "reduction-latency" tries to measure.