On Thu, 25 Jun 2026, Pengfei Li wrote:
> In loop vectorization analysis, the target can suggest an unroll factor
> based on the cost model to expose more ILP. However, when a loop has a
> known iteration count that is no greater than the current vectorization
> factor, the vectorized loop will be executed at most once. In this case,
> applying a suggested unroll factor greater than 1 only increases the
> code size and complexity of the loop body.
>
> The testcase added in this patch has a fixed 16-iteration byte SAD loop.
> When compiling it on some AArch64 SVE targets, the cost model suggests
> an unroll factor of 4 even though one vector iteration in VNx16QI mode
> covers all 16 scalar iterations. The extra unrolled chunks are fully
> masked off and redundant.
>
> This fixes the issue by resetting the suggested unroll factor when the
> iteration count is known to be no greater than the current VF.
Huh, but we have already analyzed the loop with unroll factor == 1
so failing here is exactly what we want?
> Bootstrapped and tested on aarch64-linux-gnu and x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Reset
> the suggested unroll factor for small iteration-count loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/vect-no-unroll-1.c: New test.
> ---
> .../gcc.target/aarch64/sve/vect-no-unroll-1.c | 17 +++++++++
> gcc/tree-vect-loop.cc | 38 +++++++++++++------
> 2 files changed, 43 insertions(+), 12 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> new file mode 100644
> index 00000000000..7dfa851a1da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-no-unroll-1.c
> @@ -0,0 +1,17 @@
> +/* Check that a small-niters loop is not over-unrolled. */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=neoverse-v2 -mautovec-preference=sve-only" } */
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +
> +int
> +foo (uint8_t *p1, uint8_t *p2)
> +{
> + int sum = 0;
> + for (int i = 0; i < 16; i++)
> + sum += abs (p1[i] - p2[i]);
> + return sum;
> +}
> +
> +/* { dg-final { scan-assembler-not {\tld1b\t[^\n]*, mul vl} } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 0167d52b28b..45e5eca5724 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4420,19 +4420,33 @@ vect_estimate_min_profitable_iters (loop_vec_info
> loop_vinfo,
> *suggested_unroll_factor
> = loop_vinfo->vector_costs->suggested_unroll_factor ();
>
> - if (suggested_unroll_factor && *suggested_unroll_factor > 1
> - && LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) != MAX_VECTORIZATION_FACTOR
> - && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo) *
> - *suggested_unroll_factor,
> - LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> + if (suggested_unroll_factor && *suggested_unroll_factor > 1)
> {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "can't unroll as unrolled vectorization factor larger"
> - " than maximum vectorization factor: "
> - HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> - LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> - *suggested_unroll_factor = 1;
> + if (LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) != MAX_VECTORIZATION_FACTOR
> + && !known_le (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> + * *suggested_unroll_factor,
> + LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo)))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "can't unroll as unrolled vectorization factor "
> + "larger than maximum vectorization factor: "
> + HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> + LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo));
> + *suggested_unroll_factor = 1;
> + }
> + else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "can't unroll as the loop iteration count is "
> + "no greater than the vectorization factor: "
> + HOST_WIDE_INT_PRINT_UNSIGNED "\n",
> + LOOP_VINFO_INT_NITERS (loop_vinfo));
> + *suggested_unroll_factor = 1;
> + }
> }
>
> vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)