https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124458

            Bug ID: 124458
           Summary: [16 Regression] Missed Loop Vectorization at -O3 (v.s.
                    -O2)
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xintong.zhou1 at uwaterloo dot ca
  Target Milestone: ---

Compiler Explorer: https://godbolt.org/z/vos3nrT63

Given this code:

```
short f, i;
int g[19];
int p, q;
void test() {
  for (short j; j < 5; j++)
    for (; 0 < (p ?: ({ int d = p; d > d; }));)
      for (int l; l < 4; l += 70)
        for (int m; m < 7069392; m += 3)
          for (short r; r < 9; r += 3)
            for (int n = q; n; n += 3)
              for (short o = 0; o < 19; o += 2) {
                g[n] &= i;
                f = (f < 0 ? f : 0);
              }
}
int main() {}
```

gcc-trunk applies loop vectorization at -O2:

.L9:
        movq    xmm1, QWORD PTR [rbx]
        movq    xmm0, QWORD PTR [rdx]
        movq    xmm3, QWORD PTR [rdi]
        movss   xmm1, xmm0
        movq    xmm0, QWORD PTR [r11]
        pand    xmm1, xmm2
        movss   xmm0, xmm3
        movd    DWORD PTR [rdx], xmm1
        pshufd  xmm4, xmm1, 0xe5
        pand    xmm0, xmm2
        movd    DWORD PTR [rdx+12], xmm4
        pshufd  xmm5, xmm0, 0xe5
        movd    DWORD PTR [rdi], xmm0
        movd    DWORD PTR [rdx+36], xmm5
        cmp     r8d, r10d
        je      .L8
        movsxd  rax, ebp

but misses at -O3:

.L20:
        and     DWORD PTR "g"[0+rcx*4], eax
        test    r15d, r15d
        je      .L7
.L6:
        and     DWORD PTR "g"[0+rsi*4], eax
        test    r10d, r10d
        je      .L62
        and     DWORD PTR "g"[0+rdi*4], eax
        test    ebp, ebp
        je      .L63
        and     DWORD PTR "g"[0+r8*4], eax
        test    r12d, r12d
        je      .L64
        and     DWORD PTR "g"[0+r9*4], eax
        test    r14d, r14d
        je      .L65
        and     DWORD PTR "g"[0+rbx*4], eax
        cmp     DWORD PTR [rsp-4], 0
        je      .L66
        add     edx, 3
        and     DWORD PTR "g"[0+r11*4], eax
        cmp     dx, 8
        jle     .L20


It seems that the loop unswitch pass in O3 breaks the loop vectorization.

Reply via email to