I'm using the stock Fedora 12 gcc, RPM gcc-4.4.3-4.fc12.x86_64. Whereas it appears to align __m128 variables correctly (16-bytes), it accesses them as if they were only 8 byte aligned.
I read various threads etc about aligning variables on the stack, but my belief is that this never was a problem on x86_64, and in fact using gcc-4.3.2 it generates the expected code. #include <xmmintrin.h> void test(__m128 *x) { volatile __m128 tmp = *x; } generates gcc-4.4.3-4.fc12 0000000000000000 <test>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 89 7d e8 mov %rdi,-0x18(%rbp) 8: 48 8b 45 e8 mov -0x18(%rbp),%rax c: 0f 28 00 movaps (%rax),%xmm0 f: 0f 13 45 f0 movlps %xmm0,-0x10(%rbp) 13: 0f 17 45 f8 movhps %xmm0,-0x8(%rbp) 17: c9 leaveq 18: c3 retq gcc-4.3.2: (correct) 0000000000000000 <test>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 89 7d e8 mov %rdi,0xffffffffffffffe8(%rbp) 8: 48 8b 45 e8 mov 0xffffffffffffffe8(%rbp),%rax c: 0f 28 00 movaps (%rax),%xmm0 f: 0f 29 45 f0 movaps %xmm0,0xfffffffffffffff0(%rbp) 13: c9 leaveq 14: c3 retq -- Summary: SSE2 / stack align Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: kai dot germaschewski at gmail dot com GCC build triplet: x86_64-redhat-linux GCC host triplet: x86_64-redhat-linux GCC target triplet: x86_64-redhat-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43124