> Things I tried to force a 16 byte stack alignment that didn't work:
> 
> 1  -mstackrealign
> 2  -mpreferred-stack-boundary=4
> 3  -mincoming-stack-boundary=4
> 4  2 and 3
> 5  1 and 2 and 3

And this is why they didn't work.  Change the test function to

 static __m128d __attribute__((noinline,aligned (16))) test ( __m128d
s1, __m128d s2)
{
printf("test s1"); _mm_dump_fd(s1);
printf("test s2"); _mm_dump_fd(s2);
printf("loc s1 %p\n",&s1);
printf("loc s2 %p\n",&s2);
  return _mm_add_pd (s1, s2); 
}

compile and run

 gcc -Wall -msse -mno-sse2  -I. -lm -DSOFT_SSE2 -DEMMSOFTDBG  -O1  -o
foo_wno foo.c
[r...@newsaf i386]# ./foo_wno
mm_set_pd, in 2134.334300 1234.635654
mm_set_pd, in 41124.234000 2344.235400
s1        0 1 1234.635654 2134.334300
s2        0 1 2344.235400 41124.234000
s1.xDEBUG m_d_fd:   1234.635654  2134.334300
s2.xDEBUG m_d_fd:   2344.235400 41124.234000
test s1DEBUG m_d_fd:   1234.635654  2134.334300
test s2DEBUG m_d_fd:   2134.334300 41124.234000
loc s1 0x7fff6b6ccb10   <----------------------
loc s2 0x7fff6b6ccb00   <----------------------
s1.xDEBUG m_d_fd:   1234.635654  2134.334300
s2.xDEBUG m_d_fd:   2344.235400 41124.234000
expected e0 e1 3578.871054 43258.568300
result   r0 r1 3368.969954 43258.568300
Aborted

s1 and s2 within test are already 16 byte aligned, so the extra
alignment switches did not help.  Somehow this code

  u.x = test (s1.x, s2.x);

is putting the wrong values for s2 onto the call stack.

Bizarre.  Either I'm missing something or turning off SSE2 is uncovering
a compiler bug.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Reply via email to