https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97494
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so for gcc.dg/vect/slp-11b.c we're now doing hybrid vectorization if a { 0
2 1 } load permute is supported. The reason this is needed is that
out[i*4] = (in[i*4] + 2) * 3;
out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
is "inconsistently" folded like
out[i*4] = (in[i*4] * 3 + 6);
out[i*4 + 1] = (in[i*4 + 2] * 7 + 14);
out[i*4 + 2] = (in[i*4 + 1] * 3 + 21);
out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
so for (x + 3) * 4 we're _not_ associating. That breaks SLP discovery
but with splitting we're now using SLP for the first three lane and
interleaving for the last.