https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294
--- Comment #27 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Mateusz Guzik from comment #26) > 4 stores per loop is best Do you have data to show it? > it is libcalls after 256, which is fine