https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081
--- Comment #10 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org>
---
(In reply to Richard Biener from comment #9)
> So I can adjust change_layout_cost in a bit awkward way, but it seems that
> internal_node_cost would already work out that a permute can be merged into
> an existing permute.
Right.
> It seems that existing permutes are not recorded in the "layout".
They should be if they're bijective, via:
else if (SLP_TREE_CODE (node) == VEC_PERM_EXPR
&& SLP_TREE_CHILDREN (node).length () == 1
&& (child = SLP_TREE_CHILDREN (node)[0])
&& (TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (child))
.is_constant (&imin)))
{
/* If the child has the same vector size as this node,
reversing the permutation can make the permutation a no-op.
In other cases it can change a true permutation into a
full-vector extract. */
tmp_perm.reserve (SLP_TREE_LANES (node));
for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
tmp_perm.quick_push (SLP_TREE_LANE_PERMUTATION (node)[j].second);
}
> Also vectorizable_slp_permutation_1 doesn't try to elide permutes that
> are noop based on knowledge of the layout of 'node', say a permute
> { 1 0 3 2 } of a { _1, _1, _2, _2 } node would be noop.
To do that in general, I think we'd need to value-number each
element of each node (which sounds doable). But I guess doing
it at leaves would be better than nothing.
> But change_layout_cost does MAX (count, 1) on its result anyway.
At the moment, yes, because having from_layout_i != to_layout_i
for identical layouts would be a consistency failure.
> The following elides the unnecessary permutation for this special case
> (but not the general case):
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e4430248ab5..e9048a61891 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -4389,6 +4389,19 @@ vect_optimize_slp_pass::change_layout_cost (slp_tree
> node,
> if (from_layout_i == to_layout_i)
> return 0;
>
> + /* When there's a uniform load permutation permutating that in any
> + way is free. */
> + if (SLP_TREE_LOAD_PERMUTATION (node).exists ())
> + {
> + unsigned l = SLP_TREE_LOAD_PERMUTATION (node)[0];
> + unsigned i;
> + for (i = 1; i < SLP_TREE_LOAD_PERMUTATION (node).length (); ++i)
> + if (SLP_TREE_LOAD_PERMUTATION (node)[i] != l)
> + break;
> + if (i == SLP_TREE_LOAD_PERMUTATION (node).length ())
> + return 0;
> + }
> +
> auto_vec<slp_tree, 1> children (1);
> children.quick_push (node);
> auto_lane_permutation_t perm (SLP_TREE_LANES (node));
>
> I'm not sure this is the correct place to factor in cost savings
> materialization would give. Is it?
Yeah, I think so. The patch LGTM. I don't know if it's worth
caching the “all the same element” result, but probably not.
> Are explicit VEC_PERM nodes also still there in the optimization
> process or are they turned into sth implicit?
They're still there. The current algorithm inherits the old
restriction that candidate layouts must be bijective, and not
all VEC_PERM_EXPRs are. So some VEC_PERM_EXPRs would have to
be explicit whatever happens. Same for non-bijective load
permutations.