https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> --- I see. Note it's SLP discoveries association code that figures out a SLP graph, disabling this ends up with single-lane (store-lanes) from the start. The association that "succeeds" first wins, and it's an unfortunate one (for SLP pattern detection). The thing is that the re-association greedily figures the best operand order as well. We start with t.c:3:21: note: pre-sorted chains of plus_expr plus_expr _19 plus_expr _27 minus_expr _26 plus_expr _18 minus_expr _29 minus_expr _28 and if we'd start with plus_expr _19 plus_expr _27 minus_expr _26 plus_expr _18 minus_expr _28 minus_expr _29 instead we get the desired SLP pattern match but still store-lanes is prefered it seems (not sure how we got away with no store-lanes in GCC 13). We could simply refuse to override the SLP graph with laod/store-lanes when patterns were found: diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 3892e1be3f2..4fb57a76f85 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -5064,7 +5065,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size, to cancel SLP when this applied to all instances in a loop but now we decide this per SLP instance. It's important to do this only after SLP pattern recognition. */ - if (is_a <loop_vec_info> (vinfo)) + if (!pattern_found && is_a <loop_vec_info> (vinfo)) FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance) if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_store && !SLP_INSTANCE_TREE (instance)->ldst_lanes) when starting with the swapped ops above we then get the desired code again. I've hacked that in with diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 3892e1be3f2..4fb57a76f85 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2275,6 +2275,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, /* 1. pre-sort according to def_type and operation. */ for (unsigned lane = 0; lane < group_size; ++lane) chains[lane].stablesort (dt_sort_cmp, vinfo); + std::swap (chains[1][2], chains[1][1]); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, it happens that in this specific case the optimal operand order matches stmt order so the following produces that - but I'm not positively sure that's always good (though the 'stablesort' also tries to not disturb order - but in this case it's the DFS order collecting the scalar ops). In reality there's not enough info on the op or its definition to locally decide a better order for future pattern matching. diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 3892e1be3f2..f21e8b909ff 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -1684,7 +1684,12 @@ dt_sort_cmp (const void *op1_, const void *op2_, void *) auto *op2 = (const chain_op_t *) op2_; if (op1->dt != op2->dt) return (int)op1->dt - (int)op2->dt; - return (int)op1->code - (int)op2->code; + if ((int)op1->code != (int)op2->code) + return (int)op1->code - (int)op2->code; + if (TREE_CODE (op1->op) == SSA_NAME && TREE_CODE (op2->op) == SSA_NAME) + return (gimple_uid (SSA_NAME_DEF_STMT (op1->op)) + - gimple_uid (SSA_NAME_DEF_STMT (op2->op))); + return 0; } /* Linearize the associatable expression chain at START with the That said, I don't have a good idea on how to make this work better, not even after re-doing SLP discovery. Maybe SLP patterns need to work on the initial single-lane SLP graph? But then we'd have to find lane-matches on two unconnected SLP sub-graphs which complicates the pattern matching part. We basically form SLP nodes from two sets of (two lanes) plus/minus ops (three each) but we of course try to avoid SLP build of all 3! permutations possible and stop at the first one that succeeds.