This patch series adds a new enum and routines for classifying a vector
load or store implementation.  Originally there were three motivations:

  (1) Reduce cut-&-paste

  (2) Make the chosen vectorisation strategy more obvious.  At the
      moment this is derived implicitly from various other bits of
      state (GROUPED, STRIDED, SLP, etc.)

  (3) Decouple the vectorisation strategy from those other bits of state,
      so that there can be a choice of implementation for a given scalar
      statement.  The specific problem here is that we class:

          for (...)
            {
              ... = a[i * x];
              ... = a[i * x + 1];
            }

      as "strided and grouped" but:

          for (...)
            {
              ... = a[i * 7];
              ... = a[i * 7 + 1];
            }

      as "non-strided and grouped".  Before the patches, "strided and
      grouped" loads would always try to use separate scalar loads
      while "non-strided and grouped" loads would always try to use
      load-and-permute.  But load-and-permute is never supported for
      a group size of 7, so the effect was that the first loop was
      vectorisable and the second wasn't.  It seemed odd that not
      knowing x (but accepting it could be 7) would allow more
      optimisation opportunities than knowing x is 7.

Unfortunately, it looks like we underestimate the cost of separate
scalar accesses on at least aarch64, so I've disabled (3) for now;
see the "if" statement at the end of get_load_store_type in patch 6.
I think the series still does (1) and (2) though, so that's the
justification for it in its current form.  It also means that (3)
is now simply a case of removing the FIXME code, once the cost model
problems have been sorted out.  (I did wonder about adding a --param,
but that seems overkill.  I hope to get back to this during GCC 7 stage 1.)

Thanks,
Richard

Reply via email to