https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294
--- Comment #25 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Mateusz Guzik from comment #24) > I got the thing compiled against top of git. > > with this as a testcase: > void zero(char *buf) > { > __builtin_memset(buf, 0, SIZE); > } > > compiled like so: > ./xgcc -O2 -DSIZE=128 -mno-sse -c ~/zero.c && objdump --disassemble=zero > zero.o > > The compiler emits completely unrolled stores for sizes up to 128, which > raises an eye-brow but is perhaps fine. Is unroll up to 16 stores per iteration or unroll up to 4 stores per iteration better? > > However, for 129 and higher I see the code going back to rep, which is not > the expected state. > > The expected behavior is unrolled loops, 32 bytes per iteration, of up to > 256. What happens for size > 256 bytes?