https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #25 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Mateusz Guzik from comment #24)
> I got the thing compiled against top of git.
> 
> with this as a testcase:
> void zero(char *buf)
> {
>         __builtin_memset(buf, 0, SIZE);
> }
> 
> compiled like so:
> ./xgcc -O2 -DSIZE=128 -mno-sse -c ~/zero.c && objdump --disassemble=zero
> zero.o
> 
> The compiler emits completely unrolled stores for sizes up to 128, which
> raises an eye-brow but is perhaps fine.

Is unroll up to 16 stores per iteration or unroll up to 4 stores per
iteration better?

> 
> However, for 129 and higher I see the code going back to rep, which is not
> the expected state.
> 
> The expected behavior is unrolled loops, 32 bytes per iteration, of up to
> 256.

What happens for size > 256 bytes?

Reply via email to