https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122808
Bug ID: 122808
Summary: Missing CSE in array element addressing
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: xry111 at gcc dot gnu.org
Target Milestone: ---
struct T
{
int s1, s2, buf[10];
};
struct T TEST_STRUCT[20];
void
foo (int index, int data)
{
if (TEST_STRUCT[index].s1 < data)
TEST_STRUCT[index].s2 = data;
TEST_STRUCT[index].s1++;
}
is compiled to (at -O3):
"foo(int, int)":
movsxd rdi, edi
lea rdx, [rdi+rdi*2]
sal rdx, 4
mov eax, DWORD PTR "TEST_STRUCT"[rdx]
cmp eax, esi
jge .L2
mov DWORD PTR "TEST_STRUCT"[rdx+4], esi
.L2:
lea rdx, [rdi+rdi*2]
add eax, 1
sal rdx, 4
mov DWORD PTR "TEST_STRUCT"[rdx], eax
ret
Note that the calculation of &TEST_STRUCT[index] is done twice. It may seem
not so serious with a simple lea instruction, but we can easily adjust the size
of T so the addressing will use the relatively costly mul instruction, by
changing "buf[10]" to "buf[111]".
Clang does it correct:
movsxd rax, edi
imul rcx, rax, 452
lea rdx, [rip + TEST_STRUCT]
lea rax, [rdx + rcx]
mov ecx, dword ptr [rcx + rdx]
cmp ecx, esi
jge .LBB0_2
mov dword ptr [rax + 4], esi
.LBB0_2:
inc ecx
mov dword ptr [rax], ecx
ret
There's a patch to "work around" this for LoongArch basically by deferring the
expand of multiplication to split1, but I don't think it's the correct solution
as the issue is clearly target-independent.