https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91019
Bug ID: 91019
Summary: Missed optimization on sequential memcpy calls
Product: gcc
Version: 9.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: mserdarsanli at gmail dot com
Target Milestone: ---
#include <stdint.h>
#include <string.h>
void encode_v1(uint8_t *buf, uint64_t a1, uint16_t a2) {
memcpy(buf, &a1, 6);
memcpy(buf+6, &a2, 2);
}
void encode_v2(uint8_t *buf, uint64_t a1, uint16_t a2) {
memcpy(buf, &a1, 8);
memcpy(buf+6, &a2, 2);
}
Two functions above should be equivalent, packing arguments into buffer.
`encode_v1` copies 6 bytes, then 2 bytes.
`encode_v2` copies 8 bytes, then replaces last two bytes.
Functionally they are the same, while v2 generates better assembly.
This is the assembly with -O3 (https://godbolt.org/z/i6TMiY)
encode_v1(unsigned char*, unsigned long, unsigned short):
mov eax, esi
shr rsi, 32
mov WORD PTR [rdi+6], dx
mov DWORD PTR [rdi], eax
mov WORD PTR [rdi+4], si
ret
encode_v2(unsigned char*, unsigned long, unsigned short):
mov QWORD PTR [rdi], rsi
mov WORD PTR [rdi+6], dx
ret