https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122808

            Bug ID: 122808
           Summary: Missing CSE in array element addressing
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xry111 at gcc dot gnu.org
  Target Milestone: ---

struct T
{
  int s1, s2, buf[10];
};

struct T TEST_STRUCT[20];

void
foo (int index, int data)
{
  if (TEST_STRUCT[index].s1 < data)
    TEST_STRUCT[index].s2 = data;
  TEST_STRUCT[index].s1++;
}

is compiled to (at -O3):

"foo(int, int)":
        movsxd  rdi, edi
        lea     rdx, [rdi+rdi*2]
        sal     rdx, 4
        mov     eax, DWORD PTR "TEST_STRUCT"[rdx]
        cmp     eax, esi
        jge     .L2
        mov     DWORD PTR "TEST_STRUCT"[rdx+4], esi
.L2:
        lea     rdx, [rdi+rdi*2]
        add     eax, 1
        sal     rdx, 4
        mov     DWORD PTR "TEST_STRUCT"[rdx], eax
        ret

Note that the calculation of &TEST_STRUCT[index] is done twice.  It may seem
not so serious with a simple lea instruction, but we can easily adjust the size
of T so the addressing will use the relatively costly mul instruction, by
changing "buf[10]" to "buf[111]".

Clang does it correct:

        movsxd  rax, edi
        imul    rcx, rax, 452
        lea     rdx, [rip + TEST_STRUCT]
        lea     rax, [rdx + rcx]
        mov     ecx, dword ptr [rcx + rdx]
        cmp     ecx, esi
        jge     .LBB0_2
        mov     dword ptr [rax + 4], esi
.LBB0_2:
        inc     ecx
        mov     dword ptr [rax], ecx
        ret

There's a patch to "work around" this for LoongArch basically by deferring the
expand of multiplication to split1, but I don't think it's the correct solution
as the issue is clearly target-independent.

Reply via email to