> Things I tried to force a 16 byte stack alignment that didn't work: > > 1 -mstackrealign > 2 -mpreferred-stack-boundary=4 > 3 -mincoming-stack-boundary=4 > 4 2 and 3 > 5 1 and 2 and 3
And this is why they didn't work. Change the test function to static __m128d __attribute__((noinline,aligned (16))) test ( __m128d s1, __m128d s2) { printf("test s1"); _mm_dump_fd(s1); printf("test s2"); _mm_dump_fd(s2); printf("loc s1 %p\n",&s1); printf("loc s2 %p\n",&s2); return _mm_add_pd (s1, s2); } compile and run gcc -Wall -msse -mno-sse2 -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG -O1 -o foo_wno foo.c [r...@newsaf i386]# ./foo_wno mm_set_pd, in 2134.334300 1234.635654 mm_set_pd, in 41124.234000 2344.235400 s1 0 1 1234.635654 2134.334300 s2 0 1 2344.235400 41124.234000 s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 test s1DEBUG m_d_fd: 1234.635654 2134.334300 test s2DEBUG m_d_fd: 2134.334300 41124.234000 loc s1 0x7fff6b6ccb10 <---------------------- loc s2 0x7fff6b6ccb00 <---------------------- s1.xDEBUG m_d_fd: 1234.635654 2134.334300 s2.xDEBUG m_d_fd: 2344.235400 41124.234000 expected e0 e1 3578.871054 43258.568300 result r0 r1 3368.969954 43258.568300 Aborted s1 and s2 within test are already 16 byte aligned, so the extra alignment switches did not help. Somehow this code u.x = test (s1.x, s2.x); is putting the wrong values for s2 onto the call stack. Bizarre. Either I'm missing something or turning off SSE2 is uncovering a compiler bug. Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech