On Fri, 27 Jun 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Fri, 27 Jun 2025, Richard Biener wrote:
> >
> >> On Thu, 26 Jun 2025, Richard Sandiford wrote:
> >> 
> >> > Richard Biener <rguent...@suse.de> writes:
> >> > > The following fixes the computation of supports_partial_vectors which
> >> > > is used to prune the set of modes to iterate over for epilog
> >> > > vectorization.  The used partial_vectors_supported_p predicate
> >> > > only looks for while_ult while also support predication when
> >> > > mask modes are integer modes as for AVX512.
> >> > >
> >> > > I've noticed this isn't very effective on x86_64 anyway since
> >> > > if the main loop mode is autodetected we skip re-analyzing
> >> > > mode_i == 0, but then mode_i == 1 is usually the very same
> >> > > large mode.
> >> > >
> >> > > Thus I do wonder if we should instead always (or when
> >> > > --param vect-partial-vector-usage != 0, or when the target would
> >> > > support predication in principle) perform main loop analysis
> >> > > with partial vectors in mind (start with can_use_partial_vectors_p =
> >> > > true), but only at the end honor the --param when deciding on
> >> > > using_partial_vectors_p.  We can then remember 
> >> > > can_use_partial_vectors_p
> >> > > for each analyzed mode and use that more specific info for the
> >> > > pruning?
> >> > 
> >> > Yeah, sounds like that could work.  In principle, epilogue loops should
> >> > be strictly easier to vectorise than main loops.  If you know that the
> >> > epilogue "loop" never iterates, there could in principle be cases
> >> > where we'd need to clear can_use_partial_vectors_p for the main loop
> >> > but not for the epilogue loop.  I can't think of any situation like
> >> > that off-hand though.  Likewise for unrolling.
> >> 
> >> So we already do analyze the main loop for partial vector usage when
> >> --param vect-partial-vector-usage != 0, so for the purpose of
> >> pruning epilogue analysis we should be able to use
> >> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
> >> 
> >> As you say there might in theory be corner cases, like when
> >> applying a suggested unroll factor to the main loop.  I can't
> >> think of a reason for when we don't, so we can in principle
> >> just remember the analysis result without if required.
> >> 
> >> But basically it would be like below, I'll post this separately
> >> again so the CI can pick it up.
> >> 
> >> Would that be OK as-is or do you think we should be looking
> >> to deal with the unrolled main loop case preventively?
> >
> > It's easy enough to do, like with the following.  So that's what
> > I'm going to test.
> 
> Argh!  Sorry, just realised that this won't work for AArch64 after all.
> 
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P doesn't include information about
> whether the loop control is supported (good), but it does still contain
> information about whether masking is supported for individual operations,
> with that information being specific to the current vector mode,
> rather than being a general statement about the target as a whole.
> 
> So I think this would have the effect of preventing SVE epilogue loops
> for Advanced SIMD main loops, which is something we currently support.
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P would be false for any nontrivial
> Advanced SIMD loop due to the lack of masked load/store support.

Oh, I see.  I'll go with the original fix for this then.

Richard.

> Richard
> 
> >
> > Richard.
> >
> > From b0ae2522e8ddb3381e7e22995c0ce3e700c53755 Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguent...@suse.de>
> > Date: Thu, 26 Jun 2025 11:08:04 +0200
> > Subject: [PATCH] Fixup partial_vectors_supported_p use
> > To: gcc-patches@gcc.gnu.org
> >
> > The following fixes the computation of supports_partial_vectors which
> > is used to prune the set of modes to iterate over for epilog
> > vectorization.  The used partial_vectors_supported_p predicate
> > only looks for while_ult while also support predication when
> > mask modes are integer modes as for AVX512.
> >
> > I've noticed this isn't very effective on x86_64 anyway since
> > if the main loop mode is autodetected we skip re-analyzing
> > mode_i == 0, but then mode_i == 1 is usually the very same
> > large mode.  This is fixed by the next patch.
> >
> > The following simplifies the logic by simply re-using the
> > already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from
> > the main loop to decide whether we can possibly use partial
> > vectors for the epilogue (for the case of having the same VF).
> > We remember the main loop analysis before a suggested unroll
> > factor is applied to avoid possible differences from that.
> >
> >     * tree-vect-loop.cc (vect_analyze_loop_1): New parameter
> >     to output whether the not unrolled loop can use partial
> >     vectors.
> >     (vect_analyze_loop): Use the main loop partial vector
> >     analysis result to decide if epilogues with the same VF
> >     can use partial vectors.
> > ---
> >  gcc/tree-vect-loop.cc | 25 ++++++++++++++++++-------
> >  1 file changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index c824b5abaaf..fa022dfad42 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -3474,7 +3474,8 @@ vect_analyze_loop_1 (class loop *loop, 
> > vec_info_shared *shared,
> >                  loop_vec_info orig_loop_vinfo,
> >                  const vector_modes &vector_modes, unsigned &mode_i,
> >                  machine_mode &autodetected_vector_mode,
> > -                bool &fatal)
> > +                bool &fatal,
> > +                bool &loop_as_epilogue_supports_partial_vectors)
> >  {
> >    loop_vec_info loop_vinfo
> >      = vect_create_loop_vinfo (loop, shared, loop_form_info, 
> > orig_loop_vinfo);
> > @@ -3488,6 +3489,8 @@ vect_analyze_loop_1 (class loop *loop, 
> > vec_info_shared *shared,
> >    opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal,
> >                                     &suggested_unroll_factor,
> >                                     slp_done_for_suggested_uf);
> > +  loop_as_epilogue_supports_partial_vectors
> > +    = LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo);
> >    if (dump_enabled_p ())
> >      dump_printf_loc (MSG_NOTE, vect_location,
> >                  "***** Analysis %s with vector mode %s\n",
> > @@ -3633,6 +3636,8 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> >    for (unsigned i = 0; i < vector_modes.length (); ++i)
> >      cached_vf_per_mode.safe_push (0);
> >  
> > +  bool supports_partial_vectors = false;
> > +
> >    /* First determine the main loop vectorization mode, either the first
> >       one that works, starting with auto-detecting the vector mode and then
> >       following the targets order of preference, or the one with the
> > @@ -3644,10 +3649,12 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> >        /* Set cached VF to -1 prior to analysis, which indicates a mode has
> >      failed.  */
> >        cached_vf_per_mode[last_mode_i] = -1;
> > +      bool loop_as_epilogue_supports_partial_vectors;
> >        opt_loop_vec_info loop_vinfo
> >     = vect_analyze_loop_1 (loop, shared, &loop_form_info,
> >                            NULL, vector_modes, mode_i,
> > -                          autodetected_vector_mode, fatal);
> > +                          autodetected_vector_mode, fatal,
> > +                          loop_as_epilogue_supports_partial_vectors);
> >        if (fatal)
> >     break;
> >  
> > @@ -3677,7 +3684,11 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> >           first_loop_vinfo = opt_loop_vec_info::success (NULL);
> >         }
> >       if (first_loop_vinfo == NULL)
> > -       first_loop_vinfo = loop_vinfo;
> > +       {
> > +         first_loop_vinfo = loop_vinfo;
> > +         supports_partial_vectors
> > +           = loop_as_epilogue_supports_partial_vectors;
> > +       }
> >       else
> >         {
> >           delete loop_vinfo;
> > @@ -3742,8 +3753,7 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> >      vector_modes[0] = autodetected_vector_mode;
> >    mode_i = 0;
> >  
> > -  bool supports_partial_vectors =
> > -    partial_vectors_supported_p () && param_vect_partial_vector_usage != 0;
> > +  supports_partial_vectors &= param_vect_partial_vector_usage != 0;
> >    poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo);
> >  
> >    loop_vec_info orig_loop_vinfo = first_loop_vinfo;
> > @@ -3769,12 +3779,13 @@ vect_analyze_loop (class loop *loop, gimple 
> > *loop_vectorized_call,
> >                          "***** Re-trying epilogue analysis with vector "
> >                          "mode %s\n", GET_MODE_NAME (vector_modes[mode_i]));
> >  
> > -     bool fatal;
> > +     bool fatal, loop_as_epilogue_supports_partial_vectors;
> >       opt_loop_vec_info loop_vinfo
> >         = vect_analyze_loop_1 (loop, shared, &loop_form_info,
> >                                orig_loop_vinfo,
> >                                vector_modes, mode_i,
> > -                              autodetected_vector_mode, fatal);
> > +                              autodetected_vector_mode, fatal,
> > +                              loop_as_epilogue_supports_partial_vectors);
> >       if (fatal)
> >         break;
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to