https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294
--- Comment #32 from Mateusz Guzik <mjguzik at gmail dot com> --- For non-simd asm you can do at most 8 bytes per one mov instruction. Stock gcc resorts to rep movsq for sizes bigger than 40 bytes. Telling it to not use rep movsq results in loops of 4 movsq instructions (aka 32 bytes per iteration). An ok upper limit to still do this instead of punting to libcall is 256 bytes. In case of -mno-simd I'm advocating for issuing the 32-byte (aka 4 store) loops up to 256 bytes and punting to libcall otherwise. Fully unrolling these would raise numerous eyebrows due to i-cache footprint and I don't believe this is warranted.