On Mon, Dec 15, 2025 at 6:27 PM Tamar Christina <[email protected]> wrote:
>
> In the PR we see that the new scalar IV tricks other passes to think there's 
> an
> overflow to the use of a signed counter:
>
> The loop is known to iterate 8191 times and we have a VF of 8 and it starts
> at 2.
>
> The codegen out of the vectorizer is the same as before, except we now have a
> scalar variable counting the scalar iteration count vs a vector one.
>
> i.e. we have
>
> _45 = _39 + 8;
>
> vs
>
> _46 = _45 + { 16, 16, 16, 16, ... }
>
> we pick a lower VF now since costing allows it to but that's not important.
>
> When we get to cunroll since the value is now scalar, it sees that 8 * 8191
> would overflow a signed short and so it changes the loop bounds to the largest
> possible signed value and then uses this to elide the ivtmp_50 < 8191 as 
> always
> true and so you get an infinite loop:
>
> Analyzing # of iterations of loop 1
>   exit condition [1, + , 1](no_overflow) < 8191
>   bounds on difference of bases: 8190 ... 8190
>   result:
>     # of iterations 8190, bounded by 8190
> Statement (exit)if (ivtmp_50 < 8191)
>  is executed at most 8190 (bounded by 8190) + 1 times in loop 1.
> Induction variable (signed short) 8 + 8 * iteration does not wrap in statement
> _45 = _39 + 8;
>  in loop 1.
> Statement _45 = _39 + 8;
>  is executed at most 4094 (bounded by 4094) + 1 times in loop 1.
>
> The signed type was originally chosen because of the negative offset we use 
> when
> adjusting for peeling for alignments with masks.  However this then introduces
> issues as we see here with signed overflow.  This patch instead uses an
> unsigned type, and adjusts the MAX to always be a signed max.  Since unsigned
> underflow is defined and wraps around the negative offset for the PFA isn't an
> issue.

I think there might be the situation that the scalar niter does not
fit a signed niter type
and thus using a signed MAX could be wrong?  For loop masking we have some
elaborate code to chose the scalar niter IV type, I suppose we could
re-use that here?
The alignment peeling offset can also make an unsigned IV not enough
to cover all
scalar iterations.  With VF == 1 the current vector IV might be also
prone to overflow
for the degenerate niter == UINT_MAX case when doing mask peeling for alignment?

It might be good to try adding some test coverage for the degenerate
cases, possibly
on -m32 only so to limit run-time (maybe w/o any memory access, to
avoid large memory
use as well).

Richard.

> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Pushed.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         PR tree-optimization/123089
>         * tree-vect-loop-manip.cc (vect_do_peeling): Use unsigned type for
>         scalar IV.
>         * tree-vect-loop.cc 
> (vect_update_ivs_after_vectorizer_for_early_breaks):
>         Use signed MAX for PFA offset update.
>
> gcc/testsuite/ChangeLog:
>
>         PR tree-optimization/123089
>         * gcc.dg/vect/vect-early-break_141-pr123089.c: New test.
>
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_141-pr123089.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_141-pr123089.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..431edbfbde6731e205788495a93d90e252e717f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_141-pr123089.c
> @@ -0,0 +1,40 @@
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break_hw } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target avx2_runtime { target { i?86-*-* x86_64-*-* 
> } } } */
> +
> +/* { dg-additional-options "-O3 -fno-strict-aliasing -march=znver3" { target 
> { i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" { target { i?86-*-* 
> x86_64-*-* } } } } */
> +
> +#include "tree-vect.h"
> +
> +struct
> +{
> +  int d;
> +  short e;
> +} i;
> +
> +int b;
> +int *h = &b;
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +
> +  short f = 1;
> +  short *g = &i.e;
> +
> +a:
> +  if (*g = 0 & ++f, *h)
> +    ;
> +  else
> +    {
> +      int c = 0;
> +      if (f)
> +        goto a;
> +      h = &c;
> +    }
> +
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> c6d8b05b617f2e73e329a72516964ba0f48b677b..0e3cf327708831d69e2a6e73b103edec078a3d81
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3662,7 +3662,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
> tree nitersm1,
>        tree vector_iters_vf = niters_vector_mult_vf;
>        if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>         {
> -         tree scal_iv_ty = signed_type_for (TREE_TYPE (vector_iters_vf));
> +         tree scal_iv_ty = unsigned_type_for (TREE_TYPE (vector_iters_vf));
>           tree tmp_niters_vf = make_ssa_name (scal_iv_ty);
>           basic_block exit_bb = NULL;
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> f9dd88ed82468923ce6f1d01324a247a47169cde..2c014ed27371d8d8a1e731ffcd66b9fd99626e9d
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -11005,10 +11005,15 @@ vect_update_ivs_after_vectorizer_for_early_breaks 
> (loop_vec_info loop_vinfo)
>       final IV.  */
>    if (niters_skip)
>      {
> -      induc_def = gimple_build (&iv_stmts, MAX_EXPR, TREE_TYPE (induc_def),
> -                               induc_def,
> -                               build_zero_cst (TREE_TYPE (induc_def)));
> -      auto stmt = gimple_build_assign (phi_var, induc_def);
> +      tree induc_type = TREE_TYPE (induc_def);
> +      tree s_induc_type = signed_type_for (induc_type);
> +      induc_def = gimple_build (&iv_stmts, MAX_EXPR, s_induc_type,
> +                               gimple_convert (&iv_stmts, s_induc_type,
> +                                               induc_def),
> +                               build_zero_cst (s_induc_type));
> +      auto stmt = gimple_build_assign (phi_var,
> +                                      gimple_convert (&iv_stmts, induc_type,
> +                                                      induc_def));
>        gimple_seq_add_stmt_without_update (&iv_stmts, stmt);
>        basic_block exit_bb = NULL;
>        /* Identify the early exit merge block.  I wish we had stored this.  */
>
>
> --

Reply via email to