https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66598

            Bug ID: 66598
           Summary: With -O3 gcc incorrectly assumes aligned SSE
                    instructions (e.g. movapd) can be used
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: michael.l...@uni-ulm.de
  Target Milestone: ---

Compiled with gcc-4.9 and gcc-5.0 and -O3 the following code causes a
"Segmentation fault: 11" on all my Intel machines with SSE:

-----------------------------------
double Q[4*64];
double P[5*64];

int
main()
{
   int i, j;
   double *p = P;
   double *q = Q;

   for (j=0; j<32; ++j) {
       for (i=0; i<4; ++i) {
           q[i] = p[i];
       }
       q += 4;
       p += 5;
   }

   return 0;
}
-----------------------------------

Looking at the assembly code the problem is in

-----------------------------------
L2:
       movapd  16(%rax), %xmm0
       addq    $40, %rax
       addq    $32, %rdx
       movapd  -40(%rax), %xmm1
       movaps  %xmm0, -16(%rdx)
       movaps  %xmm1, -32(%rdx)
       cmpq    %rcx, %rax
       jne     L2
-----------------------------------

So %rax contains the address of p.  But even if p=P is initially alined
correctly on a 16-Byte address P+5 is not.  So movapd must not be used. 
Changing the assembly code manually to

-----------------------------------
L2:
       movupd  16(%rax), %xmm0
       addq    $40, %rax
       addq    $32, %rdx
       movupd  -40(%rax), %xmm1
       movaps  %xmm0, -16(%rdx)
       movaps  %xmm1, -32(%rdx)
       cmpq    %rcx, %rax
       jne     L2
-----------------------------------

fixed the problem.


Cheers,

Michael

Reply via email to