https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123449
Bug ID: 123449
Summary: Missed cost model check on partial vector epilog
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
On x86_64 with -march=znver5 we get
t.c:3:21: optimized: loop vectorized using 64 byte vectors and unroll factor 64
t.c:3:21: optimized: epilogue loop vectorized using masked 64 byte vectors and
unroll factor 64
for the following testcase, but the vector epilog is executed even when
n == 1, despite
t.c:3:21: note: Cost model analysis:
Vector inside of loop cost: 64
Vector prologue cost: 36
Vector epilogue cost: 0
Scalar iteration cost: 32
Scalar outside cost: 32
Vector outside cost: 36
prologue iterations: 0
epilogue iterations: 0
Minimum number of vector iterations: 1
Calculated minimum iters for profitability: 3
t.c:3:21: note: Runtime profitability threshold = 3
t.c:3:21: note: Static estimate profitability threshold = 5
because
t.c:3:21: note: no need for a runtime choice between the scalar and vector
loops
this allows to elide the scalar loop (when not versioning for other reasons,
but even with versionig we do not include the scalar iteration bound check).
This can be harmful if most of the cases the loop is executed a small number
of times.
void foo (int n, unsigned char * __restrict a, unsigned char *b)
{
for (int i = 0; i < n; ++i)
a[i] = b[i] + 7;
}