https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97494

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so for gcc.dg/vect/slp-11b.c we're now doing hybrid vectorization if a { 0
2 1 } load permute is supported.  The reason this is needed is that

      out[i*4] = (in[i*4] + 2) * 3;
      out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
      out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;

is "inconsistently" folded like

      out[i*4] = (in[i*4] * 3 + 6);
      out[i*4 + 1] = (in[i*4 + 2] * 7 + 14);
      out[i*4 + 2] = (in[i*4 + 1] * 3 + 21);
      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;

so for (x + 3) * 4 we're _not_ associating.  That breaks SLP discovery
but with splitting we're now using SLP for the first three lane and
interleaving for the last.

Reply via email to