https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123672
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
forwprop1 changes:
a_7 = *x_6(D);
b_9 = *y_8(D);
- c_10 = VEC_PERM_EXPR <a_7, a_7, { 0, 2, 0, 2 }>;
- d_11 = VEC_PERM_EXPR <a_7, a_7, { 1, 3, 1, 3 }>;
+ c_10 = VEC_PERM_EXPR <a_7, b_9, { 0, 2, 4, 4 }>;
+ d_11 = VEC_PERM_EXPR <a_7, b_9, { 1, 3, 5, 5 }>;
e_12 = VEC_PERM_EXPR <b_9, b_9, { 0, 2, 0, 2 }>;
f_13 = VEC_PERM_EXPR <b_9, b_9, { 1, 3, 1, 3 }>;
_1 = c_10 + d_11;
_2 = c_10 - d_11;
g_14 = VEC_PERM_EXPR <_1, _2, { 0, 4, 1, 5 }>;
_3 = e_12 + f_13;
_4 = e_12 - f_13;
- h_15 = VEC_PERM_EXPR <_3, _4, { 0, 4, 1, 5 }>;
+ h_15 = VEC_PERM_EXPR <_1, _2, { 2, 6, 3, 7 }>;
*x_6(D) = g_14;
*y_8(D) = h_15;
return;
What is wrong are the new selectors on the two new VEC_PERM_EXPRs, it should
have been 0, 2, 4, 6 and 1, 3, 5, 7.
By using 4 twice and 5 twice only 2 lanes from b are actually used when
previously all 4 have been used.