On Thu, 26 Jun 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > The following fixes the computation of supports_partial_vectors which
> > is used to prune the set of modes to iterate over for epilog
> > vectorization.  The used partial_vectors_supported_p predicate
> > only looks for while_ult while also support predication when
> > mask modes are integer modes as for AVX512.
> >
> > I've noticed this isn't very effective on x86_64 anyway since
> > if the main loop mode is autodetected we skip re-analyzing
> > mode_i == 0, but then mode_i == 1 is usually the very same
> > large mode.
> >
> > Thus I do wonder if we should instead always (or when
> > --param vect-partial-vector-usage != 0, or when the target would
> > support predication in principle) perform main loop analysis
> > with partial vectors in mind (start with can_use_partial_vectors_p =
> > true), but only at the end honor the --param when deciding on
> > using_partial_vectors_p.  We can then remember can_use_partial_vectors_p
> > for each analyzed mode and use that more specific info for the
> > pruning?
> 
> Yeah, sounds like that could work.  In principle, epilogue loops should
> be strictly easier to vectorise than main loops.  If you know that the
> epilogue "loop" never iterates, there could in principle be cases
> where we'd need to clear can_use_partial_vectors_p for the main loop
> but not for the epilogue loop.  I can't think of any situation like
> that off-hand though.  Likewise for unrolling.

So we already do analyze the main loop for partial vector usage when
--param vect-partial-vector-usage != 0, so for the purpose of
pruning epilogue analysis we should be able to use
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.

As you say there might in theory be corner cases, like when
applying a suggested unroll factor to the main loop.  I can't
think of a reason for when we don't, so we can in principle
just remember the analysis result without if required.

But basically it would be like below, I'll post this separately
again so the CI can pick it up.

Would that be OK as-is or do you think we should be looking
to deal with the unrolled main loop case preventively?

Thanks,
Richard.

>From ef60826a888247da723385c84c1dca2aead7b2e4 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguent...@suse.de>
Date: Thu, 26 Jun 2025 11:08:04 +0200
Subject: [PATCH] Fixup partial_vectors_supported_p use
To: gcc-patches@gcc.gnu.org

The following fixes the computation of supports_partial_vectors which
is used to prune the set of modes to iterate over for epilog
vectorization.  The used partial_vectors_supported_p predicate
only looks for while_ult while also support predication when
mask modes are integer modes as for AVX512.

I've noticed this isn't very effective on x86_64 anyway since
if the main loop mode is autodetected we skip re-analyzing
mode_i == 0, but then mode_i == 1 is usually the very same
large mode.  This is fixed by the next patch.

The following simplifies the logic by simply re-using the
already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
the main loop to decide whether we can possibly use partial
vectors for the epilogue (for the case of having the same VF).

        * tree-vect-loop.cc (vect_analyze_loop): Use the main
        loop partial vector analysis result to decide if epilogues
        with the same VF can use partial vectors.
---
 gcc/tree-vect-loop.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c824b5abaaf..603d60d8977 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3742,8 +3742,9 @@ vect_analyze_loop (class loop *loop, gimple 
*loop_vectorized_call,
     vector_modes[0] = autodetected_vector_mode;
   mode_i = 0;
 
-  bool supports_partial_vectors =
-    partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
+  bool supports_partial_vectors
+    = (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (first_loop_vinfo)
+       && param_vect_partial_vector_usage != 0);
   poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
 
   loop_vec_info orig_loop_vinfo = first_loop_vinfo;
-- 
2.43.0

Reply via email to