On Thu, 25 Jun 2026, Pengfei Li wrote:

> In loop vectorization analysis, the target can suggest an unroll factor
> based on the cost model to expose more ILP. However, when a loop has a
> known iteration count that is no greater than the current vectorization
> factor, the vectorized loop will be executed at most once. In this case,
> applying a suggested unroll factor greater than 1 only increases the
> code size and complexity of the loop body.
> 
> The testcase added in this patch has a fixed 16-iteration byte SAD loop.
> When compiling it on some AArch64 SVE targets, the cost model suggests
> an unroll factor of 4 even though one vector iteration in VNx16QI mode
> covers all 16 scalar iterations. The extra unrolled chunks are fully
> masked off and redundant.
> 
> This fixes the issue by resetting the suggested unroll factor when the
> iteration count is known to be no greater than the current VF.

Huh, but we have already analyzed the loop with unroll factor == 1
so failing here is exactly what we want?

> Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.
> 
> gcc/ChangeLog:
> 
>       * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Reset
>       the suggested unroll factor for small iteration-count loops.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/sve/vect-no-unroll-1.c: New test.
> ---
>  .../gcc.target/aarch64/sve/vect-no-unroll-1.c | 17 +++++++++
>  gcc/tree-vect-loop.cc                         | 38 +++++++++++++------
>  2 files changed, 43 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> new file mode 100644
> index 00000000000..7dfa851a1da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> @@ -0,0 +1,17 @@
> +/* Check that a small-niters loop is not over-unrolled.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=neoverse-v2 -mautovec-preference=sve-only" } */
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +
> +int
> +foo (uint8_t *p1, uint8_t *p2)
> +{
> +  int sum = 0;
> +  for (int i = 0; i < 16; i++)
> +    sum += abs (p1[i] - p2[i]);
> +  return sum;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tld1b\t[^\n]*, mul vl} } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 0167d52b28b..45e5eca5724 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4420,19 +4420,33 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>      *suggested_unroll_factor
>        = loop_vinfo->vector_costs->suggested_unroll_factor ();
>  
> -  if (suggested_unroll_factor && *suggested_unroll_factor > 1
> -      && LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) != MAX_VECTORIZATION_FACTOR
> -      && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo) *
> -                 *suggested_unroll_factor,
> -                 LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> +  if (suggested_unroll_factor && *suggested_unroll_factor > 1)
>      {
> -      if (dump_enabled_p ())
> -     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                      "can't unroll as unrolled vectorization factor larger"
> -                      " than maximum vectorization factor: "
> -                      HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> -                      LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> -      *suggested_unroll_factor = 1;
> +      if (LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) != MAX_VECTORIZATION_FACTOR
> +       && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> +                     * *suggested_unroll_factor,
> +                     LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "can't unroll as unrolled vectorization factor "
> +                          "larger than maximum vectorization factor: "
> +                          HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> +                          LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> +       *suggested_unroll_factor = 1;
> +     }
> +      else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +            && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> +                         LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "can't unroll as the loop iteration count is "
> +                          "no greater than the vectorization factor: "
> +                          HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> +                          LOOP_VINFO_INT_NITERS (loop_vinfo));
> +       *suggested_unroll_factor = 1;
> +     }
>      }
>  
>    vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to