https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97494
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- OK, so for gcc.dg/vect/slp-11b.c we're now doing hybrid vectorization if a { 0 2 1 } load permute is supported. The reason this is needed is that out[i*4] = (in[i*4] + 2) * 3; out[i*4 + 1] = (in[i*4 + 2] + 2) * 7; out[i*4 + 2] = (in[i*4 + 1] + 7) * 3; out[i*4 + 3] = (in[i*4 + 3] + 3) * 4; is "inconsistently" folded like out[i*4] = (in[i*4] * 3 + 6); out[i*4 + 1] = (in[i*4 + 2] * 7 + 14); out[i*4 + 2] = (in[i*4 + 1] * 3 + 21); out[i*4 + 3] = (in[i*4 + 3] + 3) * 4; so for (x + 3) * 4 we're _not_ associating. That breaks SLP discovery but with splitting we're now using SLP for the first three lane and interleaving for the last.