> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 16 December 2025 08:14
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; nd <[email protected]>; [email protected];
> [email protected]
> Subject: Re: [PATCH][vect]: generate new scalar early break IV as unsigned
> [PR123089]
>
> On Mon, Dec 15, 2025 at 6:27 PM Tamar Christina
> <[email protected]> wrote:
> >
> > In the PR we see that the new scalar IV tricks other passes to think
> > there's an
> > overflow to the use of a signed counter:
> >
> > The loop is known to iterate 8191 times and we have a VF of 8 and it starts
> > at 2.
> >
> > The codegen out of the vectorizer is the same as before, except we now have
> a
> > scalar variable counting the scalar iteration count vs a vector one.
> >
> > i.e. we have
> >
> > _45 = _39 + 8;
> >
> > vs
> >
> > _46 = _45 + { 16, 16, 16, 16, ... }
> >
> > we pick a lower VF now since costing allows it to but that's not important.
> >
> > When we get to cunroll since the value is now scalar, it sees that 8 * 8191
> > would overflow a signed short and so it changes the loop bounds to the
> largest
> > possible signed value and then uses this to elide the ivtmp_50 < 8191 as
> always
> > true and so you get an infinite loop:
> >
> > Analyzing # of iterations of loop 1
> > exit condition [1, + , 1](no_overflow) < 8191
> > bounds on difference of bases: 8190 ... 8190
> > result:
> > # of iterations 8190, bounded by 8190
> > Statement (exit)if (ivtmp_50 < 8191)
> > is executed at most 8190 (bounded by 8190) + 1 times in loop 1.
> > Induction variable (signed short) 8 + 8 * iteration does not wrap in
> > statement
> > _45 = _39 + 8;
> > in loop 1.
> > Statement _45 = _39 + 8;
> > is executed at most 4094 (bounded by 4094) + 1 times in loop 1.
> >
> > The signed type was originally chosen because of the negative offset we use
> when
> > adjusting for peeling for alignments with masks. However this then
> introduces
> > issues as we see here with signed overflow. This patch instead uses an
> > unsigned type, and adjusts the MAX to always be a signed max. Since
> unsigned
> > underflow is defined and wraps around the negative offset for the PFA isn't
> an
> > issue.
>
> I think there might be the situation that the scalar niter does not
> fit a signed niter type
> and thus using a signed MAX could be wrong? For loop masking we have
> some
> elaborate code to chose the scalar niter IV type, I suppose we could
> re-use that here?
I guess you mean LOOP_VINFO_RGROUP_IV_TYPE. Sure I'll use that.
> The alignment peeling offset can also make an unsigned IV not enough
> to cover all
> scalar iterations. With VF == 1 the current vector IV might be also
> prone to overflow
> for the degenerate niter == UINT_MAX case when doing mask peeling for
> alignment?
>
> It might be good to try adding some test coverage for the degenerate
> cases, possibly
> on -m32 only so to limit run-time (maybe w/o any memory access, to
> avoid large memory
> use as well).
Will do.
Tamar
>
> Richard.
>
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Pushed.
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/123089
> > * tree-vect-loop-manip.cc (vect_do_peeling): Use unsigned type for
> > scalar IV.
> > * tree-vect-loop.cc
> > (vect_update_ivs_after_vectorizer_for_early_breaks):
> > Use signed MAX for PFA offset update.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/123089
> > * gcc.dg/vect/vect-early-break_141-pr123089.c: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_141-pr123089.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_141-pr123089.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..431edbfbde6731e205
> 788495a93d90e252e717f0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_141-pr123089.c
> > @@ -0,0 +1,40 @@
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break_hw } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target avx2_runtime { target { i?86-*-*
> > x86_64-*-*
> } } } */
> > +
> > +/* { dg-additional-options "-O3 -fno-strict-aliasing -march=znver3" {
> > target
> { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-final { scan-tree-dump "loop vectorized" "vect" { target { i?86-*-*
> x86_64-*-* } } } } */
> > +
> > +#include "tree-vect.h"
> > +
> > +struct
> > +{
> > + int d;
> > + short e;
> > +} i;
> > +
> > +int b;
> > +int *h = &b;
> > +
> > +int
> > +main ()
> > +{
> > + check_vect ();
> > +
> > + short f = 1;
> > + short *g = &i.e;
> > +
> > +a:
> > + if (*g = 0 & ++f, *h)
> > + ;
> > + else
> > + {
> > + int c = 0;
> > + if (f)
> > + goto a;
> > + h = &c;
> > + }
> > +
> > + return 0;
> > +}
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> c6d8b05b617f2e73e329a72516964ba0f48b677b..0e3cf327708831d69e2a
> 6e73b103edec078a3d81 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3662,7 +3662,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> > tree vector_iters_vf = niters_vector_mult_vf;
> > if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > {
> > - tree scal_iv_ty = signed_type_for (TREE_TYPE (vector_iters_vf));
> > + tree scal_iv_ty = unsigned_type_for (TREE_TYPE (vector_iters_vf));
> > tree tmp_niters_vf = make_ssa_name (scal_iv_ty);
> > basic_block exit_bb = NULL;
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> f9dd88ed82468923ce6f1d01324a247a47169cde..2c014ed27371d8d8a1e7
> 31ffcd66b9fd99626e9d 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -11005,10 +11005,15 @@
> vect_update_ivs_after_vectorizer_for_early_breaks (loop_vec_info loop_vinfo)
> > final IV. */
> > if (niters_skip)
> > {
> > - induc_def = gimple_build (&iv_stmts, MAX_EXPR, TREE_TYPE
> (induc_def),
> > - induc_def,
> > - build_zero_cst (TREE_TYPE (induc_def)));
> > - auto stmt = gimple_build_assign (phi_var, induc_def);
> > + tree induc_type = TREE_TYPE (induc_def);
> > + tree s_induc_type = signed_type_for (induc_type);
> > + induc_def = gimple_build (&iv_stmts, MAX_EXPR, s_induc_type,
> > + gimple_convert (&iv_stmts, s_induc_type,
> > + induc_def),
> > + build_zero_cst (s_induc_type));
> > + auto stmt = gimple_build_assign (phi_var,
> > + gimple_convert (&iv_stmts,
> > induc_type,
> > + induc_def));
> > gimple_seq_add_stmt_without_update (&iv_stmts, stmt);
> > basic_block exit_bb = NULL;
> > /* Identify the early exit merge block. I wish we had stored this.
> > */
> >
> >
> > --