Hi,

When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an actual loop but only executes one iteration.

This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound.

Regression tested on aarch64-linux-gnu and also ran the testsuite using '--param vect-partial-vector-usage=1' detecting no ICEs and no execution failures.

Would be good to have this tested for PPC too as I believe they are the main users of the --param vect-partial-vector-usage=1 option. Can someone help me test (and maybe even benchmark?) this on a PPC target?

Kind regards,
Andre

gcc/ChangeLog:

        * tree-vect-loop.c (vect_transform_loop): Use main loop's various' thresholds
        to narrow the upper bound on epilogue iterations.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.

diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c 
b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
new file mode 100644
index 
0000000000000000000000000000000000000000..a03229eb55585f637ebd5288fb4c00f8f921d44c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/part_vect_single_iter_epilog.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param vect-partial-vector-usage=1" } */
+
+void
+foo (short * __restrict__ a, short * __restrict__ b, short * __restrict__ c, 
int n)
+{
+  for (int i = 0; i < n; ++i)
+    c[i] = a[i] + b[i];
+}
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, wzr, [xw][0-9]+} 1 
} } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
3e973e774af8f9205be893e01ad9263281116885..81e9c5cc42415a0a92b765bc46640105670c4e6b
 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -9723,12 +9723,31 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   /* In these calculations the "- 1" converts loop iteration counts
      back to latch counts.  */
   if (loop->any_upper_bound)
-    loop->nb_iterations_upper_bound
-      = (final_iter_may_be_partial
-        ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
-                         lowest_vf) - 1
-        : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
-                          lowest_vf) - 1);
+    {
+      loop_vec_info main_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
+      loop->nb_iterations_upper_bound
+       = (final_iter_may_be_partial
+          ? wi::udiv_ceil (loop->nb_iterations_upper_bound + bias_for_lowest,
+                           lowest_vf) - 1
+          : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest,
+                            lowest_vf) - 1);
+      if (main_vinfo)
+       {
+         unsigned int bound;
+         poly_uint64 main_iters
+           = upper_bound (LOOP_VINFO_VECT_FACTOR (main_vinfo),
+                          LOOP_VINFO_COST_MODEL_THRESHOLD (main_vinfo));
+         main_iters
+           = upper_bound (main_iters,
+                          LOOP_VINFO_VERSIONING_THRESHOLD (main_vinfo));
+         if (can_div_away_from_zero_p (main_iters,
+                                       LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+                                       &bound))
+           loop->nb_iterations_upper_bound
+             = wi::umin ((widest_int) (bound - 1),
+                         loop->nb_iterations_upper_bound);
+      }
+  }
   if (loop->any_likely_upper_bound)
     loop->nb_iterations_likely_upper_bound
       = (final_iter_may_be_partial

Reply via email to