https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225
Bug ID: 123225
Summary: [16 Regression] Overly-aggressive vectorization of
uncounted loops
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: victorldn at gcc dot gnu.org
Target Milestone: ---
Given the loop:
short *
foo (short *arr)
{
unsigned int pos = 0;
while(1)
{
arr++;
if (*arr == 0)
break;
}
return arr;
}
and the overly-aggressive vectorization of uncounted loops,
the following assembly code is generated for -march=armv9-a
at -O3:
.L5:
lsl x3, x2, 1
add x2, x2, 8
ldr q31, [x0, x3]
cmpeq p15.h, p7/z, z31.h, #0
b.none .L5
add x1, x1, x3
.p2align 5,,15
.L6:
ldrsh w0, [x1, 2]!
cbnz w0, .L6
Which, when compared to its non-vectorized conterpart,
.L2:
ldrsh w1, [x0, 2]!
cbnz w1, .L2
ret
is found to produce slower code.
This accounts for the regression seen in the xalancbmk{,_r}
benchmark for both AArch64 & x86_64 due to the vectorization of the
xercesc_2_7::ValueStore::contains(xercesc_2_7::FieldValueMap const*)
function.