https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77456

--- Comment #8 from petschy at gmail dot com ---
I created two other bugs (bug 77482 for the segfault and bug 77485 for the DSE
issue). As I noted in the latter, I'm a bit confused about the store merging,
and what change Kyrill's patch will make, as the version compiled with gcc 7.0
somewhat merges the stores using xmm0, so the problem is not that no merging
occurs, but it occurs inconsistently.

Furthermore, there must be a threshold at the amount of data above which the
codegen should decide that it's more efficient to store the bytes in .rodata
and memcpy to the destination than to store with multiple insns, even if
merged.

This logic kicks in at baz_sized(), but does not in baz(). Interestingly, in
the latter no xmm0 was used, every single byte is movb'd after the memset,
whereas foo() and bar() with smaller data used xmm0, too.

Dump of assembler code for function baz():
   0x0000000000400800 <+0>:     sub    $0x8,%rsp
   0x0000000000400804 <+4>:     mov    $0x4e20,%edx
   0x0000000000400809 <+9>:     xor    %esi,%esi
   0x000000000040080b <+11>:    callq  0x4004f0 <memset@plt>
   0x0000000000400810 <+16>:    movb   $0x30,(%rax)
   0x0000000000400813 <+19>:    movb   $0x20,0x1(%rax)
   0x0000000000400817 <+23>:    movb   $0x31,0x2(%rax)
   0x000000000040081b <+27>:    movb   $0x20,0x3(%rax)
   0x000000000040081f <+31>:    movb   $0x32,0x4(%rax)
....
   0x0000000000422674 <+138868>:        movb   $0x32,0x4db3(%rax)
   0x000000000042267b <+138875>:        movb   $0x30,0x4db4(%rax)
   0x0000000000422682 <+138882>:        movb   $0x30,0x4db5(%rax)
   0x0000000000422689 <+138889>:        add    $0x8,%rsp
   0x000000000042268d <+138893>:        retq   

Even if the byte stores were merged into 64bit stores, the function still would
be huge, and a memcpy instead would be way better.

Reply via email to