http://bugs.freedesktop.org/show_bug.cgi?id=12216
------- Comment #8 from [EMAIL PROTECTED] 2007-08-30 12:52 PST ------- (In reply to comment #4) > Looks to me like the src address may just not be 128bit aligned, which is > required for movaps. In this case, using movups instead should fix this. Could > you try if this works correctly? (though you might hit the same problem > elsewhere, there's lots of similar code around so I wouldn't be surprised if > it's wrong in other places too.) A better solution might be to actually change > the code so it guarantees it's 128bit aligned, if possible (using aligned > mallocs, though I'm right now not quite sure if there even is some > compiler-independent solution to do this if the variable is on the stack > instead?). Manually aligning a stack variable in a compiler independent way is fairly simply, but like an aligned malloc() implementation, it takes slightly more memory and setup that normally a compiler would do (e.g. GCC's __attribute__). The basic idea is that a 128 bit value requires 16 bytes of stack space, and assuming the worst case that its address mod 16 is 1, up to 15 extra bytes of padding space is needed. To be round, I would suggest just using 32 bytes. To find the aligned address, you need: char data[32]; unsigned long addr; /* Hopefully same size as ptr in LP64 and ILP32 models */ float* aligned_ptr; addr = (unsigned long)&data[0]; addr = addr + (16 - (addr & 0x0F)); /* Align the address */ aligned_ptr = (float*)addr; /* Currently have a single 128-bit variable aligned */ Replacing movaps with movups has severe performance hits in heavy memory traffic areas, and by forcing the lowest common denominator, you cannot use the x86 memory reference to relieve register pressure since SSE/2/3 operations that use memory operands must also be aligned just as movaps must. I don't suggest that every instance be replaced, rather I would imagine that getting aligned data would be the optimal solution. In fact, I don't know of a single machine architecture that doesn't like aligned data, but I can certainly think of a few who choke on unaligned data. It would probably be better to place a debug assert() checking the addresses for alignment, or better yet, write a general case version which does so inside of the assembly itself. e.g: test rax, 15 ;Some pointer in rax jz aligned_loop unaligned_loop: [...code...] ret aligned_loop: [...code...] I pretty famaliar with x86 assembly language, especially SIMD instruction sets, perhaps I could help a bit? Patrick Baggett -- Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Mesa3d-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
