On Wed, Aug 27, 2025 at 4:53 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Wed, Aug 27, 2025 at 6:57 AM liuhongt <hongtao....@intel.com> wrote:
> >
> > Since kind == vec_perm may not be a real vec_perm, just a broadcast or
> > simple load in BB vectorizer.
>
> Btw, you can now (in some cases) do better, namely you should
> always have 'node' available and when SLP_TREE_PERMUTE_P (node)
> then SLP_TREE_LANE_PERMUTATION could be inspected to
> detect the harmful cross-lane permutes.  Note BB vectorization
> still (always IIRC) uses SLP_TREE_LOAD_PERMUTATION,
> so for permuted loads you have a load 'node' and the permutation
> applied is visible in SLP_TREE_LOAD_PERMUTATION (which is
> a simpler data structure).  That said, BB vectorization loads
> could have harmful AVX2 permutes attached, so the patch is
> maybe a bit overzealous.
Thanks, I'll try.
>
> Richard.
>
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ready push to trunk.
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.cc (ix86_vector_costs::finish_cost):
> >         Restrict tune avx256_avoid_vec_perm to loop vectorization
> >         only.
> > ---
> >  gcc/config/i386/i386.cc | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 55c9b16dd38..5a02e12d634 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -26305,15 +26305,15 @@ ix86_vector_costs::finish_cost (const 
> > vector_costs *scalar_costs)
> >           && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant 
> > ())
> >               > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo))))
> >         m_costs[vect_body] = INT_MAX;
> > +
> > +      for (int i = 0; i != 3; i++)
> > +       if (m_num_avx256_vec_perm[i]
> > +           && TARGET_AVX256_AVOID_VEC_PERM)
> > +         m_costs[i] = INT_MAX;
> >      }
> >
> >    ix86_vect_estimate_reg_pressure ();
> >
> > -  for (int i = 0; i != 3; i++)
> > -    if (m_num_avx256_vec_perm[i]
> > -       && TARGET_AVX256_AVOID_VEC_PERM)
> > -      m_costs[i] = INT_MAX;
> > -
> >    /* When X86_TUNE_AVX512_TWO_EPILOGUES is enabled arrange for both
> >       a AVX2 and a SSE epilogue for AVX512 vectorized loops.  */
> >    if (loop_vinfo
> > --
> > 2.34.1
> >



-- 
BR,
Hongtao

Reply via email to