> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 25 June 2026 13:08
> To: Pengfei Li <[email protected]>
> Cc: [email protected]; Tamar Christina <[email protected]>;
> [email protected]
> Subject: Re: [PATCH] vect: Avoid over-unrolling for known small-iteration
> loops
> 
> On Thu, 25 Jun 2026, Pengfei Li wrote:
> 
> >
> > On 25/06/2026 12:13, Richard Biener wrote:
> > > On Thu, 25 Jun 2026, Pengfei Li wrote:
> > >
> > >> In loop vectorization analysis, the target can suggest an unroll factor
> > >> based on the cost model to expose more ILP. However, when a loop has a
> > >> known iteration count that is no greater than the current vectorization
> > >> factor, the vectorized loop will be executed at most once. In this case,
> > >> applying a suggested unroll factor greater than 1 only increases the
> > >> code size and complexity of the loop body.
> > >>
> > >> The testcase added in this patch has a fixed 16-iteration byte SAD loop.
> > >> When compiling it on some AArch64 SVE targets, the cost model suggests
> > >> an unroll factor of 4 even though one vector iteration in VNx16QI mode
> > >> covers all 16 scalar iterations. The extra unrolled chunks are fully
> > >> masked off and redundant.
> > >>
> > >> This fixes the issue by resetting the suggested unroll factor when the
> > >> iteration count is known to be no greater than the current VF.
> > > Huh, but we have already analyzed the loop with unroll factor == 1
> > > so failing here is exactly what we want?
> >
> > Just to make sure I understand. Are you saying that this is a wrong place to
> > reject the suggested unroll factor?
> 
> I'm saying that the factor will be rejected by costing when re-analyzing
> the loop with the suggested unroll factor.  Does it not?
> 
> > I put the check here to make it next to an existing max-VF check, but do you
> > think this should be handled by just skipping the vect re-analysis with 
> > unroll
> > factor > 1 in an outer function like vect_analyze_loop_1() ?
> 
> No, I'm saying we should do what the target asks us to and I think
> we'll reject it outright anyway via an exiting check.  Yes, we do
> extra work, but then the fix is to the target to not suggest such
> unrolling when it's obviously nonsense.

That's my view on this too.  The bad unroll factor is coming from the target 
which
is overriding the default of 1.

This has to do with the complexity of the comparison code vs Adv. SIMD. The
unroll factor is set thinking it can unroll with Adv. SIMD and ends up picking
SVE instead due to costing, and as such it unroll the SVE code instead.

But it's the target doing something dumb here and we shouldn't waste  the
Time of the middle end and should not propose broken unroll factors.

Thanks,
Tamar
> 
> Richard.
> 
> >
> > Thanks,
> > Pengfei
> >
> > >> Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>  * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Reset
> > >>  the suggested unroll factor for small iteration-count loops.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >>  * gcc.target/aarch64/sve/vect-no-unroll-1.c: New test.
> > >> ---
> > >>   .../gcc.target/aarch64/sve/vect-no-unroll-1.c | 17 +++++++++
> > >>   gcc/tree-vect-loop.cc                         | 38 +++++++++++++------
> > >>   2 files changed, 43 insertions(+), 12 deletions(-)
> > >>   create mode 100644
> > >>   gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> > >>
> > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> > >> b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> > >> new file mode 100644
> > >> index 00000000000..7dfa851a1da
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> > >> @@ -0,0 +1,17 @@
> > >> +/* Check that a small-niters loop is not over-unrolled.  */
> > >> +/* { dg-do compile } */
> > >> +/* { dg-options "-O2 -mtune=neoverse-v2 -mautovec-preference=sve-
> only" }
> > >> */
> > >> +
> > >> +#include <stdint.h>
> > >> +#include <stdlib.h>
> > >> +
> > >> +int
> > >> +foo (uint8_t *p1, uint8_t *p2)
> > >> +{
> > >> +  int sum = 0;
> > >> +  for (int i = 0; i < 16; i++)
> > >> +    sum += abs (p1[i] - p2[i]);
> > >> +  return sum;
> > >> +}
> > >> +
> > >> +/* { dg-final { scan-assembler-not {\tld1b\t[^\n]*, mul vl} } } */
> > >> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > >> index 0167d52b28b..45e5eca5724 100644
> > >> --- a/gcc/tree-vect-loop.cc
> > >> +++ b/gcc/tree-vect-loop.cc
> > >> @@ -4420,19 +4420,33 @@ vect_estimate_min_profitable_iters
> (loop_vec_info
> > >> loop_vinfo,
> > >>       *suggested_unroll_factor
> > >>         = loop_vinfo->vector_costs->suggested_unroll_factor ();
> > >>   -  if (suggested_unroll_factor && *suggested_unroll_factor > 1
> > >> -      && LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) !=
> > >> MAX_VECTORIZATION_FACTOR
> > >> -      && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo) *
> > >> -                    *suggested_unroll_factor,
> > >> -                    LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> > >> +  if (suggested_unroll_factor && *suggested_unroll_factor > 1)
> > >>       {
> > >> -      if (dump_enabled_p ())
> > >> -        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >> -                         "can't unroll as unrolled vectorization factor
> > >> larger"
> > >> -                         " than maximum vectorization factor: "
> > >> -                         HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> > >> -                         LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> > >> -      *suggested_unroll_factor = 1;
> > >> +      if (LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) !=
> > >> MAX_VECTORIZATION_FACTOR
> > >> +          && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> > >> +                        * *suggested_unroll_factor,
> > >> +                        LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> > >> +        {
> > >> +          if (dump_enabled_p ())
> > >> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >> +                             "can't unroll as unrolled vectorization 
> > >> factor "
> > >> +                             "larger than maximum vectorization factor: 
> > >> "
> > >> +                             HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> > >> +                             LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> > >> +          *suggested_unroll_factor = 1;
> > >> +        }
> > >> +      else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >> +               && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > >> +                            LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
> > >> +        {
> > >> +          if (dump_enabled_p ())
> > >> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >> +                             "can't unroll as the loop iteration count 
> > >> is "
> > >> +                             "no greater than the vectorization factor: 
> > >> "
> > >> +                             HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> > >> +                             LOOP_VINFO_INT_NITERS (loop_vinfo));
> > >> +          *suggested_unroll_factor = 1;
> > >> +        }
> > >>       }
> > >>
> > >>     vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
> > >>
> >
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

Reply via email to