On Tue, Jul 29, 2025 at 10:36 AM Tamar Christina
<tamar.christ...@arm.com> wrote:
>
> > -----Original Message-----
> > From: Pengfei Li <pengfei....@arm.com>
> > Sent: Monday, July 21, 2025 12:07 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: rguent...@suse.de; s...@gentoo.org; Tamar Christina
> > <tamar.christ...@arm.com>; Pengfei Li <pengfei....@arm.com>
> > Subject: [PATCH] vect: Add missing skip-vector check for peeling with 
> > versioning
> > [PR121020]
> >
> > This fixes a miscompilation issue introduced by the enablement of
> > combined loop peeling and versioning. A test case that reproduces the
> > issue is included in the patch.
> >
> > When performing loop peeling, GCC usually inserts a skip-vector check.
> > This ensures that after peeling, there are enough remaining iterations
> > to enter the main vectorized loop. Previously, the check was omitted if
> > loop versioning for alignment was applied. It was safe before because
> > versioning and peeling for alignment were mutually exclusive.
> >
> > However, with combined peeling and versioning enabled, this is not safe
> > any more. A loop may be peeled and versioned at the same time. Without
> > the skip-vector check, the main vectorized loop can be entered even if
> > its iteration count is zero. This can cause the loop running many more
> > iterations than needed, resulting in incorrect results.
> >
> > To fix this, the patch updates the condition of omitting the skip-vector
> > check to when versioning is performed alone without peeling.
> >
> > This patch is bootstrapped and regression-tested on x86_64-linux-gnu,
> > arm-linux-gnueabihf and aarch64-linux-gnu.
> >
> >       PR tree-optimization/121020
> >
> > gcc/ChangeLog:
> >
> >       * tree-vect-loop-manip.cc (vect_do_peeling): Update the
> >         condition of omitting the skip-vector check.
> >       * tree-vectorizer.h (LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING):
> >         Add a helper macro.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/vect/vect-early-break_138-pr121020.c: New test.
> > ---
> >  .../vect/vect-early-break_138-pr121020.c      | 52 +++++++++++++++++++
> >  gcc/tree-vect-loop-manip.cc                   |  2 +-
> >  gcc/tree-vectorizer.h                         |  4 ++
> >  3 files changed, 57 insertions(+), 1 deletion(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.dg/vect/vect-early-break_138-pr121020.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_138-pr121020.c
> > b/gcc/testsuite/gcc.dg/vect/vect-early-break_138-pr121020.c
> > new file mode 100644
> > index 00000000000..86661e445a8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_138-pr121020.c
> > @@ -0,0 +1,52 @@
> > +/* PR tree-optimization/121020 */
> > +/* { dg-do run } */
>
> Drop the run from here, the testsuite will figure out what to do on its own.
>
> > +/* { dg-options "-O3 --vect-cost-model=unlimited" } */
> > +/* { dg-additional-options "-march=znver2" { target x86_64-*-* i?86-*-* } 
> > } */
> > +/* { dg-require-effective-target mmap } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <sys/mman.h>
> > +#include <unistd.h>
>
> Since it's a runtime test this needs to include
>
> #include "tree-vect.h"
>
> > +
> > +__attribute__((noipa))
> > +bool equal (uint64_t *restrict p, uint64_t *restrict q, int length)
> > +{
> > +  for (int i = 0; i < length; i++) {
> > +    if (*(p + i) != *(q + i))
> > +      return false;
> > +  }
> > +  return true;
> > +}
> > +
> > +int main ()
> > +{
>
> And call  check_vect () to properly initialize all target vectors.
>
> Other than that patch looks good to me but you still need Richi's approval.

OK  with the testsuite adjustment mentioned above.

And sorry for the delay,
Richard.

> Thanks,
> Tamar
>
> > +  long pgsz = sysconf (_SC_PAGESIZE);
> > +  if (pgsz == -1) {
> > +    fprintf (stderr, "sysconf failed\n");
> > +    return 0;
> > +  }
> > +
> > +  /* Allocate a whole page of memory.  */
> > +  void *mem = mmap (NULL, pgsz, PROT_READ | PROT_WRITE,
> > +                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > +  if (mem == MAP_FAILED) {
> > +    fprintf (stderr, "mmap failed\n");
> > +    return 0;
> > +  }
> > +  uint64_t *p1 = (uint64_t *) mem;
> > +  uint64_t *p2 = (uint64_t *) mem + 32;
> > +
> > +  /* The first 16 elements pointed to by p1 and p2 are the same.  */
> > +  for (int i = 0; i < 32; i++) {
> > +    *(p1 + i) = 0;
> > +    *(p2 + i) = (i < 16 ? 0 : -1);
> > +  }
> > +
> > +  /* All calls to equal should return true.  */
> > +  for (int len = 0; len < 16; len++) {
> > +    if (!equal (p1 + 1, p2 + 1, len))
> > +      __builtin_abort();
> > +  }
> > +}
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index 2d01a4b0ed1..7fcbc1ad2eb 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3295,7 +3295,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> > niters,
> > tree nitersm1,
> >    bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >                     ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >                                 bound_prolog + bound_epilog)
> > -                   : (!LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT
> > (loop_vinfo)
> > +                   : (!LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING
> > (loop_vinfo)
> >                        || vect_epilogues));
> >
> >    /* Epilog loop must be executed if the number of iterations for epilog
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index 799d5fed7a9..bbfeecacf5b 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -1168,6 +1168,10 @@ public:
> >     || LOOP_REQUIRES_VERSIONING_FOR_NITERS (L)                \
> >     || LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (L))
> >
> > +#define LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING(L) \
> > +  ((L)->may_misalign_stmts.length () > 0             \
> > +   && !LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (L))
> > +
> >  #define LOOP_VINFO_NITERS_KNOWN_P(L)          \
> >    (tree_fits_shwi_p ((L)->num_iters) && tree_to_shwi ((L)->num_iters) > 0)
> >
> > --
> > 2.43.0
>

Reply via email to