https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107451

            Bug ID: 107451
           Summary: Segmentation fault with vectorized code.
           Product: gcc
           Version: 11.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bartoldeman at users dot sourceforge.net
  Target Milestone: ---

Created attachment 53785
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53785&action=edit
Test case

The following code:

double dot(int n, const double *x, int inc_x, const double *y)
{
        int i, ix;
        double dot[4] = { 0.0, 0.0, 0.0, 0.0 } ; 

        ix=0;
        for(i = 0; i < n; i++) {
                dot[0] += x[ix]   * y[ix]   ;
                dot[1] += x[ix+1] * y[ix+1] ;
                dot[2] += x[ix]   * y[ix+1] ;
                dot[3] += x[ix+1] * y[ix]   ;
                ix += inc_x ;
        }

        return dot[0] + dot[1] + dot[2] + dot[3];
}

int main(void)
{
        double x = 0, y = 0;
        return dot(1, &x, 4096*4096, &y);
}

crashes with (on Linux x86-64)

$ gcc -O2 -ftree-vectorize -march=haswell crash.c -o crash
$ ./a.out 
Segmentation fault

for GCC 11.3.0 and also the current prerelease (gcc version 11.3.1 20221021),
and also when patched with the patches from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107212.

The loop code assembly is as follows:

  18:   c5 f9 10 1e             vmovupd (%rsi),%xmm3
  1c:   c5 f9 10 21             vmovupd (%rcx),%xmm4
  20:   ff c2                   inc    %edx
  22:   c4 e3 65 18 0c 06 01    vinsertf128 $0x1,(%rsi,%rax,1),%ymm3,%ymm1
  29:   c4 e3 5d 18 04 01 01    vinsertf128 $0x1,(%rcx,%rax,1),%ymm4,%ymm0
  30:   48 01 c6                add    %rax,%rsi
  33:   48 01 c1                add    %rax,%rcx
  36:   c4 e3 fd 01 c9 11       vpermpd $0x11,%ymm1,%ymm1
  3c:   c4 e3 fd 01 c0 14       vpermpd $0x14,%ymm0,%ymm0
  42:   c4 e2 f5 b8 d0          vfmadd231pd %ymm0,%ymm1,%ymm2
  47:   39 fa                   cmp    %edi,%edx
  49:   75 cd                   jne    18 <dot+0x18>

what happens here is that the vinsertf128 instructions take the element from
one loop iteration later, and those get put in the high halves of ymm0 and
ymm1.
The vpermpd instructions then throw away those high halves again, so e.g. they
turn 1,2,3,4 into 2,1,2,1 and 1,2,2,1 respectively.

So the result is correct but the superfluous vinsertf128 instructions access
memory potentially past the end of x or y and thus a produce a segfault.

related issue (coming from OpenBLAS):
https://github.com/easybuilders/easybuild-easyconfigs/issues/16387
may also be related:
https://github.com/xianyi/OpenBLAS/issues/3740#issuecomment-1233899834
(the particular comment shows very similar code but it's for GCC 12 which
vectorizes by default, OpenBLAS worked around this by disabling the tree
vectorizer there but only on Mac OS and Windows).

Reply via email to