https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294
--- Comment #28 from Mateusz Guzik <mjguzik at gmail dot com> --- (In reply to H.J. Lu from comment #27) > (In reply to Mateusz Guzik from comment #26) > > 4 stores per loop is best > > Do you have data to show it? I used to, but I'm out of this game. However, this is what gcc is already emitting if you explicitly ask it for unrolled loops, so I don't think this bit should be controversial.