Tim Goetze wrote:

Simon Jenkins wrote:


for a simplified example, i'm using

float t[4];
...
asm ("movaps %%xmm1, %0" : : "m" (t[0]));

to move 4 packed floats from xmm1 into 't'.


I couldn't get this to fail in practice - though I didn't
try all that hard - unless t isn't on a  16 byte boundary
in which case it segfaults.


it failed here just a minute ago, with g++ -O6. not a segfault, but gcc seemed to think that some members of t are zero and omitted them from the final summation in my code (r = t[0] + t[1] + t[2] + t[3]).


In theory however your code is telling the compiler that
array element t[0] is in memory from which the instruction
reads. It should be more like:

asm ("movaps %%xmm1 %0" : "=m" (t) );

which now tells the compiler that the entire array t
is in memory to which the instruction writes. This
*ought* to discourage the optimiser from doing
anything too drastic. (Maybe/AFAIK/IANAL/etc).


you're right of course, 't' should be an input, not an output. however,

asm ("movaps %%xmm1 %0" : "=m" (t));

segfaults, but

asm ("movaps %%xmm1 %0" : "=m" (t[0]));

works. think i'll have to resort to 128 bit wide data types, a
simple cast should do. all this gcc inline asm stuff is ugly anyway,
and what's another cast among friends.

I can definitely get

asm ("movaps %%xmm1 %0" : "=m" (t[0]));

to exhibit the optimisation problem (the one I couldn't get your
original line to show) and then fix it again by removing the [0].

I was getting a segfault on about 50% of compiles, as I modified
the code, because the array was being aligned to 8 byte boundaries
but not to 16 bytes. Declaring it as

float t[4] __attribute__ ((aligned(16)));

got rid of those. Note though that this attribute doesn't work for
automatic variables.

Simon Jenkins
(Bristol, UK)





Reply via email to