https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixing the CSE in the testcase by doing

double a[1024];
void foo ()
{
  for (int i = 0; i < 1022; i += 2)
    {
      double tem = a[i+1];
      a[i] = tem * a[i];
      a[i+1] = a[i+2] * tem;
    }
}

gets us

t.c:4:21: note:   Detected interleaving load a[i_15] and a[_1]
t.c:4:21: note:   Detected interleaving store a[i_15] and a[_1]
t.c:4:21: note:   Detected interleaving load of size 2
t.c:4:21: note:         _2 = a[i_15];
t.c:4:21: note:         tem_10 = a[_1];
t.c:4:21: note:   Detected single element interleaving a[_4] step 16
t.c:4:21: note:   Detected interleaving store of size 2
t.c:4:21: note:         a[i_15] = _3;
t.c:4:21: note:         a[_1] = _6;

in the loop pass and failed dependence analysis and
with the SLP pass (no predcom):

t.c:10:1: note:   Detected interleaving load a[i_15] and a[_1]
t.c:10:1: note:   Detected interleaving load a[i_15] and a[_4]
t.c:10:1: note:   Detected interleaving store a[i_15] and a[_1]
t.c:10:1: note:   Detected interleaving load of size 3
t.c:10:1: note:         _2 = a[i_15];
t.c:10:1: note:         tem_10 = a[_1];
t.c:10:1: note:         _5 = a[_4];
t.c:10:1: note:   Detected interleaving store of size 2
t.c:10:1: note:         a[i_15] = _3;
t.c:10:1: note:         a[_1] = _6;

which then runs into gap vect issues for how we'd vectorize the three
element load.

The dependence analysis is done by analyzing the validity of the
vectorized load/store placement and the implied motion of the
scalar load/store statements.  The missed optimization here would
be the missed alternate placement that would be correct.  But I
think the way we form groups would need to be revisited first here.

Reply via email to