https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #32 from Mateusz Guzik <mjguzik at gmail dot com> ---
For non-simd asm you can do at most 8 bytes per one mov instruction.

Stock gcc resorts to rep movsq for sizes bigger than 40 bytes. Telling it to
not use rep movsq results in loops of 4 movsq instructions (aka 32 bytes per
iteration).

An ok upper limit to still do this instead of punting to libcall is 256 bytes.

In case of -mno-simd I'm advocating for issuing the 32-byte (aka 4 store) loops
up to 256 bytes and punting to libcall otherwise.

Fully unrolling these would raise numerous eyebrows due to i-cache footprint
and I don't believe this is warranted.

Reply via email to