https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66598
Bug ID: 66598 Summary: With -O3 gcc incorrectly assumes aligned SSE instructions (e.g. movapd) can be used Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: michael.l...@uni-ulm.de Target Milestone: --- Compiled with gcc-4.9 and gcc-5.0 and -O3 the following code causes a "Segmentation fault: 11" on all my Intel machines with SSE: ----------------------------------- double Q[4*64]; double P[5*64]; int main() { int i, j; double *p = P; double *q = Q; for (j=0; j<32; ++j) { for (i=0; i<4; ++i) { q[i] = p[i]; } q += 4; p += 5; } return 0; } ----------------------------------- Looking at the assembly code the problem is in ----------------------------------- L2: movapd 16(%rax), %xmm0 addq $40, %rax addq $32, %rdx movapd -40(%rax), %xmm1 movaps %xmm0, -16(%rdx) movaps %xmm1, -32(%rdx) cmpq %rcx, %rax jne L2 ----------------------------------- So %rax contains the address of p. But even if p=P is initially alined correctly on a 16-Byte address P+5 is not. So movapd must not be used. Changing the assembly code manually to ----------------------------------- L2: movupd 16(%rax), %xmm0 addq $40, %rax addq $32, %rdx movupd -40(%rax), %xmm1 movaps %xmm0, -16(%rdx) movaps %xmm1, -32(%rdx) cmpq %rcx, %rax jne L2 ----------------------------------- fixed the problem. Cheers, Michael