https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see.  Note it's SLP discoveries association code that figures out a SLP
graph, disabling this ends up with single-lane (store-lanes) from the start. 
The association that "succeeds" first wins, and it's an unfortunate one (for
SLP pattern detection).

The thing is that the re-association greedily figures the best operand
order as well.  We start with

t.c:3:21: note:   pre-sorted chains of plus_expr
plus_expr _19 plus_expr _27 minus_expr _26 
plus_expr _18 minus_expr _29 minus_expr _28

and if we'd start with

plus_expr _19 plus_expr _27 minus_expr _26 
plus_expr _18 minus_expr _28 minus_expr _29 

instead we get the desired SLP pattern match but still store-lanes is
prefered it seems (not sure how we got away with no store-lanes in GCC 13).
We could simply refuse to override the SLP graph with laod/store-lanes
when patterns were found:

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3892e1be3f2..4fb57a76f85 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5064,7 +5065,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned
max_tree_size,
      to cancel SLP when this applied to all instances in a loop but now
      we decide this per SLP instance.  It's important to do this only
      after SLP pattern recognition.  */
-  if (is_a <loop_vec_info> (vinfo))
+  if (!pattern_found && is_a <loop_vec_info> (vinfo))
     FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
       if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_store
          && !SLP_INSTANCE_TREE (instance)->ldst_lanes)

when starting with the swapped ops above we then get the desired code
again.  I've hacked that in with

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3892e1be3f2..4fb57a76f85 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2275,6 +2275,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
          /* 1. pre-sort according to def_type and operation.  */
          for (unsigned lane = 0; lane < group_size; ++lane)
            chains[lane].stablesort (dt_sort_cmp, vinfo);
+         std::swap (chains[1][2], chains[1][1]);
          if (dump_enabled_p ())
            {
              dump_printf_loc (MSG_NOTE, vect_location,

it happens that in this specific case the optimal operand order matches
stmt order so the following produces that - but I'm not positively sure
that's always good (though the 'stablesort' also tries to not disturb
order - but in this case it's the DFS order collecting the scalar ops).
In reality there's not enough info on the op or its definition to locally
decide a better order for future pattern matching.

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3892e1be3f2..f21e8b909ff 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1684,7 +1684,12 @@ dt_sort_cmp (const void *op1_, const void *op2_, void *)
   auto *op2 = (const chain_op_t *) op2_;
   if (op1->dt != op2->dt)
     return (int)op1->dt - (int)op2->dt;
-  return (int)op1->code - (int)op2->code;
+  if ((int)op1->code != (int)op2->code)
+    return (int)op1->code - (int)op2->code;
+  if (TREE_CODE (op1->op) == SSA_NAME && TREE_CODE (op2->op) == SSA_NAME)
+    return (gimple_uid (SSA_NAME_DEF_STMT (op1->op))
+           - gimple_uid (SSA_NAME_DEF_STMT (op2->op)));
+  return 0;
 }

 /* Linearize the associatable expression chain at START with the


That said, I don't have a good idea on how to make this work better, not
even after re-doing SLP discovery.  Maybe SLP patterns need to work on the
initial single-lane SLP graph?  But then we'd have to find lane-matches
on two unconnected SLP sub-graphs which complicates the pattern matching
part.

We basically form SLP nodes from two sets of (two lanes) plus/minus ops
(three each) but we of course try to avoid SLP build of all 3! permutations
possible and stop at the first one that succeeds.

Reply via email to