I have found several ways to "fix" the latest issue, but they all boil down to never passing an __m128d value on the call stack. For instance change
static __m128d __attribute__((noinline, unused)) test (__m128d s1, __m128d s2) to static __m128d test (__m128d s1, __m128d s2) and the program works. Similarly, change the function to static __m128d __attribute__((noinline)) test (__m128d *s1, __m128d *s2) { return _mm_add_pd (*s1, *s2); } and it also works. Things I tried to force a 16 byte stack alignment that didn't work: 1 -mstackrealign 2 -mpreferred-stack-boundary=4 3 -mincoming-stack-boundary=4 4 2 and 3 5 1 and 2 and 3 I guess the bigger question is why can an __m128d be passed on the call stack reliably when -msse2 is invoked, but not otherwise? If the compiler cannot do this reliably shouldn't it throw an error or warning? Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech