https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, I see we actually materialize a permute before the splat:

t.c:14:24: note:   node 0x5b311c0 (max_nunits=1, refcnt=2) vector(2) double
t.c:14:24: note:   op: VEC_PERM_EXPR
t.c:14:24: note:        stmt 0 _1 = *k_50;
t.c:14:24: note:        stmt 1 _1 = *k_50;
t.c:14:24: note:        stmt 2 _1 = *k_50;
t.c:14:24: note:        stmt 3 _1 = *k_50;
t.c:14:24: note:        lane permutation { 0[3] 0[2] 0[1] 0[0] }
t.c:14:24: note:        children 0x5b30fc0
t.c:14:24: note:   node 0x5b30fc0 (max_nunits=2, refcnt=1) vector(2) double
t.c:14:24: note:   op template: _1 = *k_50;
t.c:14:24: note:        stmt 0 _1 = *k_50;
t.c:14:24: note:        stmt 1 _1 = *k_50;
t.c:14:24: note:        stmt 2 _1 = *k_50;
t.c:14:24: note:        stmt 3 _1 = *k_50;
t.c:14:24: note:        load permutation { 0 0 0 0 }

this is because vect_optimize_slp_pass::get_result_with_layout doesn't
seem to consider load permutations?  It's the value-numbering we perform
that in the end elides the redundant permute.

I have the feeling something doesn't fit together exactly, during
materialize () we apply the chosen layout to the partitions, then
eliminate redundant permutes but we still end up with
get_result_with_layout adding the above permute.

I can add the same logic as I've added to change_layout_cost also to
get_result_with_layout but as said, it feels like I'm missing something ...

Reply via email to