https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124458
Bug ID: 124458
Summary: [16 Regression] Missed Loop Vectorization at -O3 (v.s.
-O2)
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: xintong.zhou1 at uwaterloo dot ca
Target Milestone: ---
Compiler Explorer: https://godbolt.org/z/vos3nrT63
Given this code:
```
short f, i;
int g[19];
int p, q;
void test() {
for (short j; j < 5; j++)
for (; 0 < (p ?: ({ int d = p; d > d; }));)
for (int l; l < 4; l += 70)
for (int m; m < 7069392; m += 3)
for (short r; r < 9; r += 3)
for (int n = q; n; n += 3)
for (short o = 0; o < 19; o += 2) {
g[n] &= i;
f = (f < 0 ? f : 0);
}
}
int main() {}
```
gcc-trunk applies loop vectorization at -O2:
.L9:
movq xmm1, QWORD PTR [rbx]
movq xmm0, QWORD PTR [rdx]
movq xmm3, QWORD PTR [rdi]
movss xmm1, xmm0
movq xmm0, QWORD PTR [r11]
pand xmm1, xmm2
movss xmm0, xmm3
movd DWORD PTR [rdx], xmm1
pshufd xmm4, xmm1, 0xe5
pand xmm0, xmm2
movd DWORD PTR [rdx+12], xmm4
pshufd xmm5, xmm0, 0xe5
movd DWORD PTR [rdi], xmm0
movd DWORD PTR [rdx+36], xmm5
cmp r8d, r10d
je .L8
movsxd rax, ebp
but misses at -O3:
.L20:
and DWORD PTR "g"[0+rcx*4], eax
test r15d, r15d
je .L7
.L6:
and DWORD PTR "g"[0+rsi*4], eax
test r10d, r10d
je .L62
and DWORD PTR "g"[0+rdi*4], eax
test ebp, ebp
je .L63
and DWORD PTR "g"[0+r8*4], eax
test r12d, r12d
je .L64
and DWORD PTR "g"[0+r9*4], eax
test r14d, r14d
je .L65
and DWORD PTR "g"[0+rbx*4], eax
cmp DWORD PTR [rsp-4], 0
je .L66
add edx, 3
and DWORD PTR "g"[0+r11*4], eax
cmp dx, 8
jle .L20
It seems that the loop unswitch pass in O3 breaks the loop vectorization.