https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124615
Bug ID: 124615
Summary: Missed ARM SVE vectorization of loop with branch and
64-bit integer division
Product: gcc
Version: 15.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kim.walisch at gmail dot com
Target Milestone: ---
Hi,
I found that the C code below is not vectorized by GCC 15.2 (and trunk) on
AArch64 using the options -O3 -march=armv8.2-a+sve.
#include <stddef.h>
#include <stdint.h>
void batch_div_u32_u128(unsigned __int128 xp,
const uint64_t* __restrict in,
uint64_t* __restrict out,
size_t n)
{
for (size_t i = 0; i < n; i++)
{
uint64_t m = in[i] + 1;
if (xp >> 64)
out[i] = ((uint64_t) xp) / m;
else
out[i] = (uint64_t)(xp / m);
}
}
However, if we remove the branch GCC 15.2 (and trunk) successfully vectorizes
the code using ARM SVE using the options -O3 -march=armv8.2-a+sve and also
using -O2 -march=armv8.2-a+sve.
#include <stddef.h>
#include <stdint.h>
void batch_div_u32_u128(uint64_t xp64,
const uint64_t* __restrict in,
uint64_t* __restrict out,
size_t n)
{
for (size_t i = 0; i < n; i++)
{
uint64_t m = in[i] + 1;
out[i] = xp64 / m;
}
}
But Clang 22 is better on this code, it manages to vectorize the first loop
with the branch using -O3, -O2, -Os & -march=armv8.2-a+sve.