https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99395
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Fixing the CSE in the testcase by doing double a[1024]; void foo () { for (int i = 0; i < 1022; i += 2) { double tem = a[i+1]; a[i] = tem * a[i]; a[i+1] = a[i+2] * tem; } } gets us t.c:4:21: note: Detected interleaving load a[i_15] and a[_1] t.c:4:21: note: Detected interleaving store a[i_15] and a[_1] t.c:4:21: note: Detected interleaving load of size 2 t.c:4:21: note: _2 = a[i_15]; t.c:4:21: note: tem_10 = a[_1]; t.c:4:21: note: Detected single element interleaving a[_4] step 16 t.c:4:21: note: Detected interleaving store of size 2 t.c:4:21: note: a[i_15] = _3; t.c:4:21: note: a[_1] = _6; in the loop pass and failed dependence analysis and with the SLP pass (no predcom): t.c:10:1: note: Detected interleaving load a[i_15] and a[_1] t.c:10:1: note: Detected interleaving load a[i_15] and a[_4] t.c:10:1: note: Detected interleaving store a[i_15] and a[_1] t.c:10:1: note: Detected interleaving load of size 3 t.c:10:1: note: _2 = a[i_15]; t.c:10:1: note: tem_10 = a[_1]; t.c:10:1: note: _5 = a[_4]; t.c:10:1: note: Detected interleaving store of size 2 t.c:10:1: note: a[i_15] = _3; t.c:10:1: note: a[_1] = _6; which then runs into gap vect issues for how we'd vectorize the three element load. The dependence analysis is done by analyzing the validity of the vectorized load/store placement and the implied motion of the scalar load/store statements. The missed optimization here would be the missed alternate placement that would be correct. But I think the way we form groups would need to be revisited first here.