https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81635
--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> --- I. The test-case slp.c (minus dg-final checks) looks like this: ... /* { dg-options "-O2 -ftree-slp-vectorize" } */ int p[1000] __attribute__((aligned(8))); int p2[1000] __attribute__((aligned(8))); void __attribute__((noinline, noclone)) foo () { unsigned int a, b; unsigned int i; for (i = 0; i < 1000; i += 2) { a = p[i]; b = p[i+1]; p2[i] = a; p2[i+1] = b; } } ... Changing the type of the loop iteration variable 'i' from 'unsigned int' to 'int' makes the slp.c test pass again. II. With int, we have same 'offset from base address' and an 'constant offset from base address' of 0 and 4: ... Creating dr for p[i_13] analyze_innermost: success. base_address: &p offset from base address: (ssizetype) ((sizetype) i_13 * 4) constant offset from base address: 0 step: 0 base alignment: 8 base misalignment: 0 offset alignment: 8 step alignment: 128 base_object: p[i_13] Creating dr for p[_1] analyze_innermost: success. base_address: &p offset from base address: (ssizetype) ((sizetype) i_13 * 4) constant offset from base address: 4 step: 0 base alignment: 8 base misalignment: 0 offset alignment: 8 step alignment: 128 base_object: p[_1] ... resulting in: ... gcc/testsuite/gcc.target/nvptx/slp.c:13:3: note: Detected interleaving load p[i_13] and p[_1] ... III. With unsigned int, we have different offset of base address (note that _1 == i_13 + 1): ... Creating dr for p[i_13] analyze_innermost: success. base_address: &p offset from base address: (ssizetype) ((sizetype) i_13 * 4) constant offset from base address: 0 step: 0 base alignment: 8 base misalignment: 0 offset alignment: 8 step alignment: 128 base_object: p[i_13] Creating dr for p[_1] analyze_innermost: success. base_address: &p offset from base address: (ssizetype) ((sizetype) _1 * 4) constant offset from base address: 0 step: 0 base alignment: 8 base misalignment: 0 offset alignment: 4 step alignment: 128 base_object: p[_1] ... resulting in: ... gcc/testsuite/gcc.target/nvptx/slp.c:13:3: note: not consecutive access b_6 = p[_1]; gcc/testsuite/gcc.target/nvptx/slp.c:13:3: note: not consecutive access a_5 = p[i_13]; ... IV. On x86_64 -m32, the test-case is not vectorized (reason: 'unrolling required in basic block SLP'), but the interleaving load is recognized, both with int and unsigned int. On x86_64 -m64, we have: - for int, detected interleaving load, but test-case not vectorized (reason: 'unrolling required in basic block SLP') - for unsigned int, we got failure to detect interleaving load, just as in III.