On Wed, Aug 27, 2025 at 6:57 AM liuhongt <hongtao....@intel.com> wrote:
>
> Since kind == vec_perm may not be a real vec_perm, just a broadcast or
> simple load in BB vectorizer.

Btw, you can now (in some cases) do better, namely you should
always have 'node' available and when SLP_TREE_PERMUTE_P (node)
then SLP_TREE_LANE_PERMUTATION could be inspected to
detect the harmful cross-lane permutes.  Note BB vectorization
still (always IIRC) uses SLP_TREE_LOAD_PERMUTATION,
so for permuted loads you have a load 'node' and the permutation
applied is visible in SLP_TREE_LOAD_PERMUTATION (which is
a simpler data structure).  That said, BB vectorization loads
could have harmful AVX2 permutes attached, so the patch is
maybe a bit overzealous.

Richard.

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
>
> gcc/ChangeLog:
>
>         * config/i386/i386.cc (ix86_vector_costs::finish_cost):
>         Restrict tune avx256_avoid_vec_perm to loop vectorization
>         only.
> ---
>  gcc/config/i386/i386.cc | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 55c9b16dd38..5a02e12d634 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26305,15 +26305,15 @@ ix86_vector_costs::finish_cost (const vector_costs 
> *scalar_costs)
>           && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ())
>               > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo))))
>         m_costs[vect_body] = INT_MAX;
> +
> +      for (int i = 0; i != 3; i++)
> +       if (m_num_avx256_vec_perm[i]
> +           && TARGET_AVX256_AVOID_VEC_PERM)
> +         m_costs[i] = INT_MAX;
>      }
>
>    ix86_vect_estimate_reg_pressure ();
>
> -  for (int i = 0; i != 3; i++)
> -    if (m_num_avx256_vec_perm[i]
> -       && TARGET_AVX256_AVOID_VEC_PERM)
> -      m_costs[i] = INT_MAX;
> -
>    /* When X86_TUNE_AVX512_TWO_EPILOGUES is enabled arrange for both
>       a AVX2 and a SSE epilogue for AVX512 vectorized loops.  */
>    if (loop_vinfo
> --
> 2.34.1
>

Reply via email to