https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124615

            Bug ID: 124615
           Summary: Missed ARM SVE vectorization of loop with branch and
                    64-bit integer division
           Product: gcc
           Version: 15.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kim.walisch at gmail dot com
  Target Milestone: ---

Hi,

I found that the C code below is not vectorized by GCC 15.2 (and trunk) on
AArch64 using the options -O3 -march=armv8.2-a+sve.

#include <stddef.h>
#include <stdint.h>

void batch_div_u32_u128(unsigned __int128 xp,
                        const uint64_t* __restrict in,
                        uint64_t* __restrict out,
                        size_t n)
{
  for (size_t i = 0; i < n; i++)
  {
    uint64_t m = in[i] + 1;
    if (xp >> 64)
        out[i] = ((uint64_t) xp) / m;
    else
        out[i] = (uint64_t)(xp / m);
  }
}


However, if we remove the branch GCC 15.2 (and trunk) successfully vectorizes
the code using ARM SVE using the options -O3 -march=armv8.2-a+sve and also
using -O2 -march=armv8.2-a+sve.

#include <stddef.h>
#include <stdint.h>

void batch_div_u32_u128(uint64_t xp64,
                        const uint64_t* __restrict in,
                        uint64_t* __restrict out,
                        size_t n)
{
  for (size_t i = 0; i < n; i++)
  {
    uint64_t m = in[i] + 1;
    out[i] = xp64 / m;
  }
}

But Clang 22 is better on this code, it manages to vectorize the first loop
with the branch using -O3, -O2, -Os & -march=armv8.2-a+sve.

Reply via email to