https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451
Bug ID: 107451
Summary: Segmentation fault with vectorized code.
Product: gcc
Version: 11.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: bartoldeman at users dot sourceforge.net
Target Milestone: ---
Created attachment 53785
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53785&action=edit
Test case
The following code:
double dot(int n, const double *x, int inc_x, const double *y)
{
int i, ix;
double dot[4] = { 0.0, 0.0, 0.0, 0.0 } ;
ix=0;
for(i = 0; i < n; i++) {
dot[0] += x[ix] * y[ix] ;
dot[1] += x[ix+1] * y[ix+1] ;
dot[2] += x[ix] * y[ix+1] ;
dot[3] += x[ix+1] * y[ix] ;
ix += inc_x ;
}
return dot[0] + dot[1] + dot[2] + dot[3];
}
int main(void)
{
double x = 0, y = 0;
return dot(1, &x, 4096*4096, &y);
}
crashes with (on Linux x86-64)
$ gcc -O2 -ftree-vectorize -march=haswell crash.c -o crash
$ ./a.out
Segmentation fault
for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021),
and also when patched with the patches from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212.
The loop code assembly is as follows:
18: c5 f9 10 1e vmovupd (%rsi),%xmm3
1c: c5 f9 10 21 vmovupd (%rcx),%xmm4
20: ff c2 inc %edx
22: c4 e3 65 18 0c 06 01 vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1
29: c4 e3 5d 18 04 01 01 vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0
30: 48 01 c6 add %rax,%rsi
33: 48 01 c1 add %rax,%rcx
36: c4 e3 fd 01 c9 11 vpermpd $0x11,%ymm1,%ymm1
3c: c4 e3 fd 01 c0 14 vpermpd $0x14,%ymm0,%ymm0
42: c4 e2 f5 b8 d0 vfmadd231pd %ymm0,%ymm1,%ymm2
47: 39 fa cmp %edi,%edx
49: 75 cd jne 18 <dot+0x18>
what happens here is that the vinsertf128 instructions take the element from
one loop iteration later, and those get put in the high halves of ymm0 and
ymm1.
The vpermpd instructions then throw away those high halves again, so e.g. they
turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively.
So the result is correct but the superfluous vinsertf128 instructions access
memory potentially past the end of x or y and thus a produce a segfault.
related issue (coming from OpenBLAS):
https://github.com/easybuilders/easybuild-easyconfigs/issues/16387
may also be related:
https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834
(the particular comment shows very similar code but it's for GCC 12 which
vectorizes by default, OpenBLAS worked around this by disabling the tree
vectorizer there but only on Mac OS and Windows).