On Thu, 26 Jun 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > The following avoids re-analyzing the loop as epilogue when not
> > using partial vectors and the mode is the same as the autodetected
> > vector mode and that has a too high VF for a non-predicated loop.
> > This situation occurs almost always on x86 and saves us one
> > re-analysis unless --param vect-partial-vector-usage is non-default.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> >
> > Thanks,
> > Richard.
> >
> >     * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue
> >     analysis further when not using partial vectors.
> > ---
> >  gcc/tree-vect-loop.cc | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index b91ef4a2325..d9091c6c705 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3770,6 +3770,26 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> >             break;
> >           continue;
> >         }
> > +     /* We would need an exhaustive search to find all modes we
> > +        skipped but that would lead to the same result as another
> > +        and where we'd could check cached_vf_per_mode against.
> 
> I didn't really follow this.  Is there a missing word around "another"?

I've reworded it to

          /* We would need an exhaustive search to find all modes we
             skipped but that would lead to the same result as the
             analysis it was skipped for and where we'd could check 
             cached_vf_per_mode against.
             Check for the autodetected mode, which is the common
             situation on x86 which does not perform cost comparison.  */

basically the mode skipping logic in vect_analyze_loop_1 leaves us
with unfilled (zero) cached_vf_per_mode[], and we'd ideally skip
the very same modes when analyzing the epilogue with the extra
maybe_ge (cached_vf_per_mode[mode_i], first_vinfo_vf) when not
using partial vectors.

> > +        Check for the autodetected mode, which is the common
> > +        situation on x86 which does not perform cost comparison.  */
> > +     if (!supports_partial_vectors
> > +         && maybe_ge (cached_vf_per_mode[0], first_vinfo_vf)
> > +         && VECTOR_MODE_P (autodetected_vector_mode)
> > +         && (related_vector_mode (vector_modes[mode_i],
> > +                                  GET_MODE_INNER 
> > (autodetected_vector_mode))
> > +             == autodetected_vector_mode)
> > +         && (related_vector_mode (autodetected_vector_mode,
> > +                                  GET_MODE_INNER (vector_modes[mode_i]))
> > +             == vector_modes[mode_i]))
> 
> Not too keen on cutting-&-pasting all this :-)  Could we split the
> VECTOR_MODE_P onwards into a subroutine that's shared with
> vect_analyze_loop_1?

Done like below.  I do wonder in which case the different variants
of vect_chooses_same_modes_p get to different answers?

Queued for re-testing with a proposed adjustment to [1/2], see other
mail I'll send out soon.

Richard.

>From 4bbf86e65f4a761d5081daf6216dc516e8717e31 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguent...@suse.de>
Date: Thu, 26 Jun 2025 11:38:47 +0200
Subject: [PATCH] Fixup vector epilog analysis skipping when not using partial
 vectors
To: gcc-patches@gcc.gnu.org

The following avoids re-analyzing the loop as epilogue when not
using partial vectors and the mode is the same as the autodetected
vector mode and that has a too high VF for a non-predicated loop.
This situation occurs almost always on x86 and saves us one
re-analysis unless --param vect-partial-vector-usage is non-default.

        * tree-vectorizer.h (vect_chooses_same_modes_p): New
        overload.
        * tree-vect-stmts.cc (vect_chooses_same_modes_p): Likewise.
        * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue
        analysis further when not using partial vectors.
---
 gcc/tree-vect-loop.cc  | 25 ++++++++++++++++++-------
 gcc/tree-vect-stmts.cc | 17 +++++++++++++++++
 gcc/tree-vectorizer.h  |  1 +
 3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b91ef4a2325..81a9716d51d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3535,13 +3535,8 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
       mode_i += 1;
     }
   if (mode_i + 1 < vector_modes.length ()
-      && VECTOR_MODE_P (autodetected_vector_mode)
-      && (related_vector_mode (vector_modes[mode_i + 1],
-                              GET_MODE_INNER (autodetected_vector_mode))
-         == autodetected_vector_mode)
-      && (related_vector_mode (autodetected_vector_mode,
-                              GET_MODE_INNER (vector_modes[mode_i + 1]))
-         == vector_modes[mode_i + 1]))
+      && vect_chooses_same_modes_p (autodetected_vector_mode,
+                                   vector_modes[mode_i + 1]))
     {
       if (dump_enabled_p ())
        dump_printf_loc (MSG_NOTE, vect_location,
@@ -3770,6 +3765,22 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
                break;
              continue;
            }
+         /* We would need an exhaustive search to find all modes we
+            skipped but that would lead to the same result as the
+            analysis it was skipped for and where we'd could check
+            cached_vf_per_mode against.
+            Check for the autodetected mode, which is the common
+            situation on x86 which does not perform cost comparison.  */
+         if (!supports_partial_vectors
+             && maybe_ge (cached_vf_per_mode[0], first_vinfo_vf)
+             && vect_chooses_same_modes_p (autodetected_vector_mode,
+                                           vector_modes[mode_i]))
+           {
+             mode_i++;
+             if (mode_i == vector_modes.length ())
+               break;
+             continue;
+           }
 
          if (dump_enabled_p ())
            dump_printf_loc (MSG_NOTE, vect_location,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 978a4626b35..89e90d317aa 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14582,6 +14582,23 @@ vect_chooses_same_modes_p (vec_info *vinfo, 
machine_mode vector_mode)
   return true;
 }
 
+/* Return true if replacing VECTOR_MODE with ALT_VECTOR_MODE would not
+   change the chosen vector modes for analysis of a loop.  */
+
+bool
+vect_chooses_same_modes_p (machine_mode vector_mode,
+                          machine_mode alt_vector_mode)
+{
+  return (VECTOR_MODE_P (vector_mode)
+         && VECTOR_MODE_P (alt_vector_mode)
+         && (related_vector_mode (vector_mode,
+                                  GET_MODE_INNER (alt_vector_mode))
+             == alt_vector_mode)
+         && (related_vector_mode (alt_vector_mode,
+                                  GET_MODE_INNER (vector_mode))
+             == vector_mode));
+}
+
 /* Function vect_is_simple_use.
 
    Input:
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 63991c3d977..f38a086d0f2 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2387,6 +2387,7 @@ extern tree get_mask_type_for_scalar_type (vec_info *, 
tree, unsigned int = 0);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
+extern bool vect_chooses_same_modes_p (machine_mode, machine_mode);
 extern bool vect_get_loop_mask_type (loop_vec_info);
 extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
                                stmt_vec_info * = NULL, gimple ** = NULL);
-- 
2.43.0

Reply via email to