Simon Jenkins wrote:I can definitely get
for a simplified example, i'm using
float t[4]; ... asm ("movaps %%xmm1, %0" : : "m" (t[0]));
to move 4 packed floats from xmm1 into 't'.
I couldn't get this to fail in practice - though I didn't try all that hard - unless t isn't on a 16 byte boundary in which case it segfaults.
it failed here just a minute ago, with g++ -O6. not a segfault, but gcc seemed to think that some members of t are zero and omitted them from the final summation in my code (r = t[0] + t[1] + t[2] + t[3]).
In theory however your code is telling the compiler that array element t[0] is in memory from which the instruction reads. It should be more like:
asm ("movaps %%xmm1 %0" : "=m" (t) );
which now tells the compiler that the entire array t is in memory to which the instruction writes. This *ought* to discourage the optimiser from doing anything too drastic. (Maybe/AFAIK/IANAL/etc).
you're right of course, 't' should be an input, not an output. however,
asm ("movaps %%xmm1 %0" : "=m" (t));
segfaults, but
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
works. think i'll have to resort to 128 bit wide data types, a simple cast should do. all this gcc inline asm stuff is ugly anyway, and what's another cast among friends.
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
to exhibit the optimisation problem (the one I couldn't get your original line to show) and then fix it again by removing the [0].
I was getting a segfault on about 50% of compiles, as I modified the code, because the array was being aligned to 8 byte boundaries but not to 16 bytes. Declaring it as
float t[4] __attribute__ ((aligned(16)));
got rid of those. Note though that this attribute doesn't work for automatic variables.
Simon Jenkins (Bristol, UK)
