On Wed, Aug 27, 2025 at 4:53 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Aug 27, 2025 at 6:57 AM liuhongt <hongtao....@intel.com> wrote: > > > > Since kind == vec_perm may not be a real vec_perm, just a broadcast or > > simple load in BB vectorizer. > > Btw, you can now (in some cases) do better, namely you should > always have 'node' available and when SLP_TREE_PERMUTE_P (node) > then SLP_TREE_LANE_PERMUTATION could be inspected to > detect the harmful cross-lane permutes. Note BB vectorization > still (always IIRC) uses SLP_TREE_LOAD_PERMUTATION, > so for permuted loads you have a load 'node' and the permutation > applied is visible in SLP_TREE_LOAD_PERMUTATION (which is > a simpler data structure). That said, BB vectorization loads > could have harmful AVX2 permutes attached, so the patch is > maybe a bit overzealous. Thanks, I'll try. > > Richard. > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (ix86_vector_costs::finish_cost): > > Restrict tune avx256_avoid_vec_perm to loop vectorization > > only. > > --- > > gcc/config/i386/i386.cc | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index 55c9b16dd38..5a02e12d634 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -26305,15 +26305,15 @@ ix86_vector_costs::finish_cost (const > > vector_costs *scalar_costs) > > && (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant > > ()) > > > ceil_log2 (LOOP_VINFO_INT_NITERS (loop_vinfo)))) > > m_costs[vect_body] = INT_MAX; > > + > > + for (int i = 0; i != 3; i++) > > + if (m_num_avx256_vec_perm[i] > > + && TARGET_AVX256_AVOID_VEC_PERM) > > + m_costs[i] = INT_MAX; > > } > > > > ix86_vect_estimate_reg_pressure (); > > > > - for (int i = 0; i != 3; i++) > > - if (m_num_avx256_vec_perm[i] > > - && TARGET_AVX256_AVOID_VEC_PERM) > > - m_costs[i] = INT_MAX; > > - > > /* When X86_TUNE_AVX512_TWO_EPILOGUES is enabled arrange for both > > a AVX2 and a SSE epilogue for AVX512 vectorized loops. */ > > if (loop_vinfo > > -- > > 2.34.1 > >
-- BR, Hongtao