On Fri, 27 Jun 2025, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Fri, 27 Jun 2025, Richard Biener wrote: > > > >> On Thu, 26 Jun 2025, Richard Sandiford wrote: > >> > >> > Richard Biener <rguent...@suse.de> writes: > >> > > The following fixes the computation of supports_partial_vectors which > >> > > is used to prune the set of modes to iterate over for epilog > >> > > vectorization. The used partial_vectors_supported_p predicate > >> > > only looks for while_ult while also support predication when > >> > > mask modes are integer modes as for AVX512. > >> > > > >> > > I've noticed this isn't very effective on x86_64 anyway since > >> > > if the main loop mode is autodetected we skip re-analyzing > >> > > mode_i == 0, but then mode_i == 1 is usually the very same > >> > > large mode. > >> > > > >> > > Thus I do wonder if we should instead always (or when > >> > > --param vect-partial-vector-usage != 0, or when the target would > >> > > support predication in principle) perform main loop analysis > >> > > with partial vectors in mind (start with can_use_partial_vectors_p = > >> > > true), but only at the end honor the --param when deciding on > >> > > using_partial_vectors_p. We can then remember > >> > > can_use_partial_vectors_p > >> > > for each analyzed mode and use that more specific info for the > >> > > pruning? > >> > > >> > Yeah, sounds like that could work. In principle, epilogue loops should > >> > be strictly easier to vectorise than main loops. If you know that the > >> > epilogue "loop" never iterates, there could in principle be cases > >> > where we'd need to clear can_use_partial_vectors_p for the main loop > >> > but not for the epilogue loop. I can't think of any situation like > >> > that off-hand though. Likewise for unrolling. > >> > >> So we already do analyze the main loop for partial vector usage when > >> --param vect-partial-vector-usage != 0, so for the purpose of > >> pruning epilogue analysis we should be able to use > >> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. > >> > >> As you say there might in theory be corner cases, like when > >> applying a suggested unroll factor to the main loop. I can't > >> think of a reason for when we don't, so we can in principle > >> just remember the analysis result without if required. > >> > >> But basically it would be like below, I'll post this separately > >> again so the CI can pick it up. > >> > >> Would that be OK as-is or do you think we should be looking > >> to deal with the unrolled main loop case preventively? > > > > It's easy enough to do, like with the following. So that's what > > I'm going to test. > > Argh! Sorry, just realised that this won't work for AArch64 after all. > > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P doesn't include information about > whether the loop control is supported (good), but it does still contain > information about whether masking is supported for individual operations, > with that information being specific to the current vector mode, > rather than being a general statement about the target as a whole. > > So I think this would have the effect of preventing SVE epilogue loops > for Advanced SIMD main loops, which is something we currently support. > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P would be false for any nontrivial > Advanced SIMD loop due to the lack of masked load/store support.
Oh, I see. I'll go with the original fix for this then. Richard. > Richard > > > > > Richard. > > > > From b0ae2522e8ddb3381e7e22995c0ce3e700c53755 Mon Sep 17 00:00:00 2001 > > From: Richard Biener <rguent...@suse.de> > > Date: Thu, 26 Jun 2025 11:08:04 +0200 > > Subject: [PATCH] Fixup partial_vectors_supported_p use > > To: gcc-patches@gcc.gnu.org > > > > The following fixes the computation of supports_partial_vectors which > > is used to prune the set of modes to iterate over for epilog > > vectorization. The used partial_vectors_supported_p predicate > > only looks for while_ult while also support predication when > > mask modes are integer modes as for AVX512. > > > > I've noticed this isn't very effective on x86_64 anyway since > > if the main loop mode is autodetected we skip re-analyzing > > mode_i == 0, but then mode_i == 1 is usually the very same > > large mode. This is fixed by the next patch. > > > > The following simplifies the logic by simply re-using the > > already computed LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P from > > the main loop to decide whether we can possibly use partial > > vectors for the epilogue (for the case of having the same VF). > > We remember the main loop analysis before a suggested unroll > > factor is applied to avoid possible differences from that. > > > > * tree-vect-loop.cc (vect_analyze_loop_1): New parameter > > to output whether the not unrolled loop can use partial > > vectors. > > (vect_analyze_loop): Use the main loop partial vector > > analysis result to decide if epilogues with the same VF > > can use partial vectors. > > --- > > gcc/tree-vect-loop.cc | 25 ++++++++++++++++++------- > > 1 file changed, 18 insertions(+), 7 deletions(-) > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index c824b5abaaf..fa022dfad42 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -3474,7 +3474,8 @@ vect_analyze_loop_1 (class loop *loop, > > vec_info_shared *shared, > > loop_vec_info orig_loop_vinfo, > > const vector_modes &vector_modes, unsigned &mode_i, > > machine_mode &autodetected_vector_mode, > > - bool &fatal) > > + bool &fatal, > > + bool &loop_as_epilogue_supports_partial_vectors) > > { > > loop_vec_info loop_vinfo > > = vect_create_loop_vinfo (loop, shared, loop_form_info, > > orig_loop_vinfo); > > @@ -3488,6 +3489,8 @@ vect_analyze_loop_1 (class loop *loop, > > vec_info_shared *shared, > > opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, > > &suggested_unroll_factor, > > slp_done_for_suggested_uf); > > + loop_as_epilogue_supports_partial_vectors > > + = LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo); > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, > > "***** Analysis %s with vector mode %s\n", > > @@ -3633,6 +3636,8 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > for (unsigned i = 0; i < vector_modes.length (); ++i) > > cached_vf_per_mode.safe_push (0); > > > > + bool supports_partial_vectors = false; > > + > > /* First determine the main loop vectorization mode, either the first > > one that works, starting with auto-detecting the vector mode and then > > following the targets order of preference, or the one with the > > @@ -3644,10 +3649,12 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > /* Set cached VF to -1 prior to analysis, which indicates a mode has > > failed. */ > > cached_vf_per_mode[last_mode_i] = -1; > > + bool loop_as_epilogue_supports_partial_vectors; > > opt_loop_vec_info loop_vinfo > > = vect_analyze_loop_1 (loop, shared, &loop_form_info, > > NULL, vector_modes, mode_i, > > - autodetected_vector_mode, fatal); > > + autodetected_vector_mode, fatal, > > + loop_as_epilogue_supports_partial_vectors); > > if (fatal) > > break; > > > > @@ -3677,7 +3684,11 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > first_loop_vinfo = opt_loop_vec_info::success (NULL); > > } > > if (first_loop_vinfo == NULL) > > - first_loop_vinfo = loop_vinfo; > > + { > > + first_loop_vinfo = loop_vinfo; > > + supports_partial_vectors > > + = loop_as_epilogue_supports_partial_vectors; > > + } > > else > > { > > delete loop_vinfo; > > @@ -3742,8 +3753,7 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > vector_modes[0] = autodetected_vector_mode; > > mode_i = 0; > > > > - bool supports_partial_vectors = > > - partial_vectors_supported_p () && param_vect_partial_vector_usage != 0; > > + supports_partial_vectors &= param_vect_partial_vector_usage != 0; > > poly_uint64 first_vinfo_vf = LOOP_VINFO_VECT_FACTOR (first_loop_vinfo); > > > > loop_vec_info orig_loop_vinfo = first_loop_vinfo; > > @@ -3769,12 +3779,13 @@ vect_analyze_loop (class loop *loop, gimple > > *loop_vectorized_call, > > "***** Re-trying epilogue analysis with vector " > > "mode %s\n", GET_MODE_NAME (vector_modes[mode_i])); > > > > - bool fatal; > > + bool fatal, loop_as_epilogue_supports_partial_vectors; > > opt_loop_vec_info loop_vinfo > > = vect_analyze_loop_1 (loop, shared, &loop_form_info, > > orig_loop_vinfo, > > vector_modes, mode_i, > > - autodetected_vector_mode, fatal); > > + autodetected_vector_mode, fatal, > > + loop_as_epilogue_supports_partial_vectors); > > if (fatal) > > break; > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)