The following avoids re-analyzing the loop as epilogue when not using partial vectors and the mode is the same as the autodetected vector mode and that has a too high VF for a non-predicated loop. This situation occurs almost always on x86 and saves us one re-analysis unless --param vect-partial-vector-usage is non-default.
Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK? Thanks, Richard. * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue analysis further when not using partial vectors. --- gcc/tree-vect-loop.cc | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b91ef4a2325..d9091c6c705 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3770,6 +3770,26 @@ vect_analyze_loop (class loop *loop, gimple *loop_vectorized_call, break; continue; } + /* We would need an exhaustive search to find all modes we + skipped but that would lead to the same result as another + and where we'd could check cached_vf_per_mode against. + Check for the autodetected mode, which is the common + situation on x86 which does not perform cost comparison. */ + if (!supports_partial_vectors + && maybe_ge (cached_vf_per_mode[0], first_vinfo_vf) + && VECTOR_MODE_P (autodetected_vector_mode) + && (related_vector_mode (vector_modes[mode_i], + GET_MODE_INNER (autodetected_vector_mode)) + == autodetected_vector_mode) + && (related_vector_mode (autodetected_vector_mode, + GET_MODE_INNER (vector_modes[mode_i])) + == vector_modes[mode_i])) + { + mode_i++; + if (mode_i == vector_modes.length ()) + break; + continue; + } if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, -- 2.43.0