http://bugs.freedesktop.org/show_bug.cgi?id=12216





------- Comment #8 from [EMAIL PROTECTED]  2007-08-30 12:52 PST -------
(In reply to comment #4)
> Looks to me like the src address may just not be 128bit aligned, which is
> required for movaps. In this case, using movups instead should fix this. Could
> you try if this works correctly? (though you might hit the same problem
> elsewhere, there's lots of similar code around so I wouldn't be surprised if
> it's wrong in other places too.) A better solution might be to actually change
> the code so it guarantees it's 128bit aligned, if possible (using aligned
> mallocs, though I'm right now not quite sure if there even is some
> compiler-independent solution to do this if the variable is on the stack
> instead?).


Manually aligning a stack variable in a compiler independent way is fairly
simply, but like an aligned malloc() implementation, it takes slightly more
memory and setup that normally a compiler would do (e.g. GCC's __attribute__).
The basic idea is that a 128 bit value requires 16 bytes of stack space, and
assuming the worst case that its address mod 16  is 1, up to 15 extra bytes of
padding space is needed. To be round, I would suggest just using 32 bytes. To
find the aligned address, you need:

char data[32];
unsigned long addr; /* Hopefully same size as ptr in LP64 and ILP32 models */
float* aligned_ptr;

addr = (unsigned long)&data[0];

addr = addr + (16 - (addr & 0x0F)); /* Align the address */

aligned_ptr = (float*)addr; /* Currently have a single 128-bit variable aligned
*/


Replacing movaps with movups has severe performance hits in heavy memory
traffic areas, and by forcing the lowest common denominator, you cannot use the
x86 memory reference to relieve register pressure since SSE/2/3 operations that
use memory operands must also be aligned just as movaps must. I don't suggest
that every instance be replaced, rather I would imagine that getting aligned
data would be the optimal solution. In fact, I don't know of a single machine
architecture that doesn't like aligned data, but I can certainly think of a few
who choke on unaligned data. It would probably be better to place a debug
assert() checking the addresses for alignment, or better yet, write a general
case version which does so inside of the assembly itself.
e.g:

test rax, 15 ;Some pointer in rax
jz aligned_loop

unaligned_loop:
[...code...]
ret
aligned_loop:
[...code...]


I pretty famaliar with x86 assembly language, especially SIMD instruction sets,
perhaps I could help a bit?

Patrick Baggett


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to