Sorry for the delay in the response.. Well, I tested it again and really, just putting the alignment in the .text segment didn't work. I then changed all the remaining movdqa (only the %rip ones sure), because I'm almost sure that they would also give problems. The change involved at most 3 instructions per call, so the performance impact as you noted should be totally negligible. The patch is attached, if someone could please put it in CVS I think that the matter is closed then.
Thanks, and thanks also for the info about up relative addressing, Tiago On Thu, 2005-08-25 at 03:33 -0600, John Slaten wrote: > Hmmm. That should have worked. > Basically, what ip relative addressing does is make the code position > independent. When the assembler sees an instruction like that, it makes > the offset (mX000X000X000X00) equal to the offset from the next > instruction in the code to the specified data, so that no matter where > the program is located in memory, the data can always be found right at > the same spot. > I don't know why forcing the alignment did not make the program work. > I suppose that the only solution then is to change all of the (%rip) > movdqa instructions to movdqu - if the changes are kept to the > ip-relative ones, the performance impact should be minimal as these are > only executed once per function call. > > On Thu, 2005-08-25 at 06:44 +0000, Tiago Victor Gehring wrote: > > Hi, > > thanks for the info and help - nothing like hearing from who wrote the > > code... > > I then applied your patch and put the original lines back, with the > > aligned copy, but I now get again the same problem as before, > > segmentation faults.. > > The first line that gives me problem is #570, this is a piece of the > > code so you can find it: > > > > LEAVE > > SIZE(imlib_amd64_blend_rgba_to_rgb) > > PR_(imlib_amd64_blend_rgba_to_rgba): > > ENTER > > > > pxor %xmm4, %xmm4 > > movdqa c1(%rip), %xmm5 -> *** here's the first problem > > xorq %rax, %rax > > movdqa mX000X000X000X000(%rip), %xmm6 > > movq [EMAIL PROTECTED](%rip), %r13 > > > > And I confirmed that now I compiled it with the ".align 16" in the .text > > segment.. > > And just being curious now, could you perhaps explain what does a > > instruction like the above does? I mean, what does the "mask" in front > > of the %rip does, you take the address of the next instruction, apply > > (AND) a mask and move it to %xmm5? Sorry, just curious... > > > > Thanks, > > Tiago > > > > > > > > On Wed, 2005-08-24 at 18:45 -0600, John Slaten wrote: > > > > > > > > Since I wrote the original code, I thought I'd weigh in on this. > > > > > > 1. The memory that is causing the errors is statically allocated data in > > > the .text segment, which I assumed would be correctly aligned and forgot > > > to supply the .align directive. Adding this (as in the attached patch) > > > _should_ fix the problem, though I have not tried to reproduce/test it. > > > The code should then work with the aligned instructions as in the > > > original. > > > 2. The existing code jumps through quite a lot of hoops to make best use > > > of the aligned instructions. For instance, when it encounters an odd > > > pixel address, it will process a single pixel at the start of the loop > > > to force alignment for the destination address (which uses both read and > > > write, and is thus more important). In fact, due to the possiblity of > > > odd scanline pitch, the alignment is checked at the start of each > > > scanline, and the correct instructions are used accordingly. > > > 3. The code was built to handle weird input that is only 1 byte aligned. > > > Thus, it should handle any alignment that is thrown at it, and if it > > > doesn't that's a bug, but it should be fixable. > > > 4. I don't recall the exact statistics, but I ran tests on aligned vs > > > unaligned instructions while I was writing the code, and using the > > > aligned instructions gives a large speed boost. I think it was about > > > 20%, but I might be wrong. The key is that movdqa is a double path > > > instruction and movdqu is a vector path instruction, and double path > > > instructions are a whole lot quicker than vector path ones. > > > > > > > > > > > > > > > _______________________________________________________ > > Yahoo! Acesso Grtis - Internet rpida e grtis. > > Instale o discador agora! http://br.acesso.yahoo.com/ > > > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO > > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA > > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > > _______________________________________________ > > enlightenment-devel mailing list > > enlightenment-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
diff -r -u e17/libs/imlib2/src/lib/amd64_blend.S e17_ori/e17/libs/imlib2/src/lib/amd64_blend.S --- e17/libs/imlib2/src/lib/amd64_blend.S 2005-08-26 20:06:33.000000000 +0000 +++ e17_ori/e17/libs/imlib2/src/lib/amd64_blend.S 2005-08-22 11:06:35.000000000 +0000 @@ -1264,7 +1264,7 @@ PR_(imlib_amd64_copy_rgb_to_rgba): ENTER - movdqu mX000X000X000X000(%rip), %xmm5 + movdqa mX000X000X000X000(%rip), %xmm5 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -1395,7 +1395,7 @@ ENTER pxor %xmm4, %xmm4 - movdqu m00XXXXXX(%rip), %xmm6 + movdqa m00XXXXXX(%rip), %xmm6 /* Move right to left across each line, */ /* processing in two pixel chunks */ @@ -1774,9 +1774,9 @@ ENTER pxor %xmm4, %xmm4 - movdqu c1(%rip), %xmm5 + movdqa c1(%rip), %xmm5 xorq %rax, %rax - movdqu mX000X000X000X000(%rip), %xmm6 + movdqa mX000X000X000X000(%rip), %xmm6 movq [EMAIL PROTECTED](%rip), %r13 /* Move right to left across each line, */ @@ -2213,7 +2213,7 @@ PR_(imlib_amd64_add_copy_rgba_to_rgb): ENTER - movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5 + movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -2364,7 +2364,7 @@ PR_(imlib_amd64_add_copy_rgba_to_rgba): ENTER - movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5 + movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -2515,7 +2515,7 @@ PR_(imlib_amd64_add_copy_rgb_to_rgba): ENTER - movdqu mX000X000X000X000(%rip), %xmm5 + movdqa mX000X000X000X000(%rip), %xmm5 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -2667,7 +2667,7 @@ ENTER pxor %xmm4, %xmm4 - movdqu m00XXXXXX(%rip), %xmm6 + movdqa m00XXXXXX(%rip), %xmm6 /* Move right to left across each line, */ /* processing in two pixel chunks */ @@ -3047,9 +3047,9 @@ movq [EMAIL PROTECTED](%rip), %r13 pxor %xmm4, %xmm4 - movdqu c1(%rip), %xmm5 - movdqu mX000X000X000X000(%rip), %xmm6 - movdqu mX000X000(%rip), %xmm7 + movdqa c1(%rip), %xmm5 + movdqa mX000X000X000X000(%rip), %xmm6 + movdqa mX000X000(%rip), %xmm7 xorq %rax, %rax /* Move right to left across each line, */ @@ -3495,7 +3495,7 @@ PR_(imlib_amd64_subtract_copy_rgba_to_rgb): ENTER - movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5 + movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -3646,8 +3646,8 @@ PR_(imlib_amd64_subtract_copy_rgba_to_rgba): ENTER - movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5 - movdqu mX000X000X000X000(%rip), %xmm6 + movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5 + movdqa mX000X000X000X000(%rip), %xmm6 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -3818,7 +3818,7 @@ PR_(imlib_amd64_subtract_copy_rgb_to_rgba): ENTER - movdqu mX000X000X000X000(%rip), %xmm5 + movdqa mX000X000X000X000(%rip), %xmm5 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi @@ -3970,8 +3970,8 @@ ENTER pxor %xmm4, %xmm4 - movdqu m000V0V0V000V0V0V(%rip), %xmm6 - movdqu m00XXXXXX(%rip), %xmm7 + movdqa m000V0V0V000V0V0V(%rip), %xmm6 + movdqa m00XXXXXX(%rip), %xmm7 /* Move right to left across each line, */ /* processing in two pixel chunks */ @@ -4288,10 +4288,10 @@ movq [EMAIL PROTECTED](%rip), %r13 pxor %xmm4, %xmm4 - movdqu c1(%rip), %xmm5 - movdqu mX000X000X000X000(%rip), %xmm6 - movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm7 - movdqu m000V0V0V000V0V0V(%rip), %xmm8 + movdqa c1(%rip), %xmm5 + movdqa mX000X000X000X000(%rip), %xmm6 + movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm7 + movdqa m000V0V0V000V0V0V(%rip), %xmm8 xorq %rax, %rax /* Move right to left across each line, */ @@ -4682,8 +4682,8 @@ PR_(imlib_amd64_reshade_copy_rgba_to_rgb): ENTER - movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5 - movdqu m0VVV0VVV0VVV0VVV(%rip), %xmm6 + movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5 + movdqa m0VVV0VVV0VVV0VVV(%rip), %xmm6 leaq (%rsi, %r8, 4), %rsi leaq (%rdi, %r8, 4), %rdi Only in e17/libs/imlib2/src/lib: amd64_blend.S~