This fixes SLP permute propagation to not propagate across operations that have different semantics on different lanes like for example the recently added COMPLEX_ADD_ROT90.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-06-24 Richard Biener <rguent...@suse.de> * tree-vect-slp.c (vect_optimize_slp): Do not propagate across operations that have different semantics on different lanes. --- gcc/tree-vect-slp.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 29db56ed532..69ee8faed09 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3680,18 +3680,34 @@ vect_optimize_slp (vec_info *vinfo) { int idx = ipo[i-1]; slp_tree node = vertices[idx].node; - /* For leafs there's nothing to do - we've seeded permutes - on those above. */ - if (SLP_TREE_DEF_TYPE (node) != vect_internal_def) + + /* Handle externals and constants optimistically throughout the + iteration. */ + if (SLP_TREE_DEF_TYPE (node) == vect_external_def + || SLP_TREE_DEF_TYPE (node) == vect_constant_def) continue; vertices[idx].visited = 1; - /* We cannot move a permute across a store. */ - if (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node)) - && DR_IS_WRITE - (STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (node)))) + /* We do not handle stores with a permutation. */ + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node); + if (STMT_VINFO_DATA_REF (rep) + && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep))) continue; + /* We cannot move a permute across an operation that is + not independent on lanes. Note this is an explicit + negative list since that's much shorter than the respective + positive one but it's critical to keep maintaining it. */ + if (is_gimple_call (STMT_VINFO_STMT (rep))) + switch (gimple_call_combined_fn (STMT_VINFO_STMT (rep))) + { + case CFN_COMPLEX_ADD_ROT90: + case CFN_COMPLEX_ADD_ROT270: + case CFN_COMPLEX_MUL: + case CFN_COMPLEX_MUL_CONJ: + continue; + default:; + } int perm = -1; for (graph_edge *succ = slpg->vertices[idx].succ; -- 2.26.2