https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83202
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2017-11-29 Blocks| |53947 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- wiht += 4 the inner loop doesn't iterate so it's effectively void test(double data[4][4]) { for (int i = 0; i < 4; i++) { data[i][i] = data[i][i] * data[i][i]; data[i][i+1] = data[i][i+1] * data[i][i+1]; } } we fail to SLP here because we get confused by the computed group size of 5 as there's a gap of three elements between the first stores of each iteration. When later doing BB vectorization we fail to analyze dependences, likely because not analyzing refs as thoroughly as with loops. For your second example we fail to loop vectorize this because we completely peel the inner loop in cunrolli, leaving control flow inside the loop... I have a patch for that one. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations