https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225

            Bug ID: 123225
           Summary: [16 Regression] Overly-aggressive vectorization of
                    uncounted loops
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: victorldn at gcc dot gnu.org
  Target Milestone: ---

Given the loop:

short *
foo (short *arr)
{
  unsigned int pos = 0;
  while(1)
    {
      arr++;
      if (*arr == 0)
        break;      
    }
  return arr;  
}

and the overly-aggressive vectorization of uncounted loops,
the following assembly code is generated for -march=armv9-a
at -O3:

.L5:
        lsl     x3, x2, 1
        add     x2, x2, 8
        ldr     q31, [x0, x3]
        cmpeq   p15.h, p7/z, z31.h, #0
        b.none  .L5
        add     x1, x1, x3
        .p2align 5,,15
.L6:
        ldrsh   w0, [x1, 2]!
        cbnz    w0, .L6

Which, when compared to its non-vectorized conterpart,

.L2:
        ldrsh   w1, [x0, 2]!
        cbnz    w1, .L2
        ret

is found to produce slower code.

This accounts for the regression seen in the xalancbmk{,_r}
benchmark for both AArch64 & x86_64 due to the vectorization of the
xercesc_2_7::ValueStore::contains(xercesc_2_7::FieldValueMap const*)
function.

Reply via email to