https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81189
Bug ID: 81189 Summary: Out of bounds memory access introduced by the vectoriser Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- The testcase gcc.dg/vect/O3-pr70130.c performs an out of bounds access when vectorised on aarch64 (I didn't check other targets). Compile with -O3. The problematic function is Loop_err: void Loop_err (struct foo *img, const int s[16][2], int s0) { int i, j; for (j = 0; j < 16; j++) { for (i=0; i < 16; i++) { img->a[0][j][i] = s[i][0]; img->a[1][j][i] = s[j][1]; img->a[2][j][i] = s0; } } } The part of the assembly code that performs the loads from s[j][1] is the problematic one: ... add x4, x1, 4 // Add a +4 offset to 's' to access s[j][1] ... .L7: ldr q0, [x4], 8 // <---- V4SI load from s[j][1] onwards add x2, x2, 32 str q4, [x2, -32] cmp x5, x2 dup v0.4s, v0.s[0] str q3, [x2, -16] str q1, [x2, 992] xtn v2.4h, v0.4s xtn2 v2.8h, v0.4s str q1, [x2, 1008] str q2, [x2, 480] str q2, [x2, 496] bne .L7 The array passed as as 's' is defined as: const int s[16][2] = { { 1, 16 }, { 2, 15 }, { 3, 14 }, { 4, 13 }, { 5, 12 }, { 6, 11 }, { 7, 10 }, { 8, 9 }, { 9, 8 }, { 10, 7 }, { 11, 6 }, { 12, 5 }, { 13, 4 }, { 14, 3 }, { 15, 2 }, { 16, 1 } }; So the V4SI load marked above loads 4 ints at a time starting from the second element in each entry of 's'. If I step through the execution gdb in gdb I see the loop reaching iteration 15 at which it loads { s[14][1], s[15][0], s[15][1], s[16][0] } where s[16][0] is out-of-bounds. GDB shows the contents of Q0 after the load as (formatted for readability): s = {0x00020001 0x00000001 0x00000010 0x00000002} As you can see the 4th element (0x00020001) is bogus (presumably from an adjacent constant pool entry) but because the code after the load doesn't use it (it only cares about element 0) it doesn't cause any problems in this instance. It is however an out-of-bounds access so we should fix it. Sorry I can't come up with an aborting testcase, I guess since the OOB memory access is only 4 bytes off in the constant pool the system doesn't trap it.