Sorry for the delay in the response.. 
Well, I tested it again and really, just putting the alignment in
the .text segment didn't work. 
I then changed all the remaining movdqa (only the %rip ones sure),
because I'm almost sure that they would also give problems. 
The change involved at most 3 instructions per call, so the performance
impact as you noted should be totally negligible.
The patch is attached, if someone could please put it in CVS I think
that the matter is closed then.

Thanks, and thanks also for the info about up relative addressing,

Tiago




On Thu, 2005-08-25 at 03:33 -0600, John Slaten wrote:
> Hmmm. That should have worked.
>   Basically, what ip relative addressing does is make the code position
> independent. When the assembler sees an instruction like that, it makes
> the offset (mX000X000X000X00) equal to the offset from the next
> instruction in the code to the specified data, so that no matter where
> the program is located in memory, the data can always be found right at
> the same spot.
>   I don't know why forcing the alignment did not make the program work.
> I suppose that the only solution then is to change all of the (%rip)
> movdqa instructions to movdqu - if the changes are kept to the
> ip-relative ones, the performance impact should be minimal as these are
> only executed once per function call.
> 
> On Thu, 2005-08-25 at 06:44 +0000, Tiago Victor Gehring wrote:
> > Hi,
> > thanks for the info and help - nothing like hearing from who wrote the
> > code...
> > I then applied your patch and put the original lines back, with the
> > aligned copy, but I now get again the same problem as before,
> > segmentation faults.. 
> > The first line that gives me problem is #570, this is a piece of the
> > code so you can find it:
> > 
> >     LEAVE
> > SIZE(imlib_amd64_blend_rgba_to_rgb)
> > PR_(imlib_amd64_blend_rgba_to_rgba):
> >     ENTER
> > 
> >     pxor %xmm4, %xmm4
> >     movdqa c1(%rip), %xmm5          -> *** here's the first problem
> >     xorq %rax, %rax
> >     movdqa mX000X000X000X000(%rip), %xmm6
> >     movq [EMAIL PROTECTED](%rip), %r13
> > 
> > And I confirmed that now I compiled it with the ".align 16" in the .text
> > segment..
> > And just being curious now, could you perhaps explain what does a
> > instruction like the above does? I mean, what does the "mask" in front
> > of the %rip does, you take the address of the next instruction, apply
> > (AND) a mask and move it to %xmm5? Sorry, just curious...
> > 
> > Thanks,
> > Tiago
> > 
> > 
> > 
> > On Wed, 2005-08-24 at 18:45 -0600, John Slaten wrote:
> > 
> > > 
> > > Since I wrote the original code, I thought I'd weigh in on this.
> > > 
> > > 1. The memory that is causing the errors is statically allocated data in
> > > the .text segment, which I assumed would be correctly aligned and forgot
> > > to supply the .align directive. Adding this (as in the attached patch)
> > > _should_ fix the problem, though I have not tried to reproduce/test it.
> > > The code should then work with the aligned instructions as in the
> > > original.
> > > 2. The existing code jumps through quite a lot of hoops to make best use
> > > of the aligned instructions. For instance, when it encounters an odd
> > > pixel address, it will process a single pixel at the start of the loop
> > > to force alignment for the destination address (which uses both read and
> > > write, and is thus more important). In fact, due to the possiblity of
> > > odd scanline pitch, the alignment is checked at the start of each
> > > scanline, and the correct instructions are used accordingly.
> > > 3. The code was built to handle weird input that is only 1 byte aligned.
> > > Thus, it should handle any alignment that is thrown at it, and if it
> > > doesn't that's a bug, but it should be fixable.
> > > 4. I don't recall the exact statistics, but I ran tests on aligned vs
> > > unaligned instructions while I was writing the code, and using the
> > > aligned instructions gives a large speed boost. I think it was about
> > > 20%, but I might be wrong. The key is that movdqa is a double path
> > > instruction and movdqu is a vector path instruction, and double path
> > > instructions are a whole lot quicker than vector path ones.
> > > 
> > 
> > 
> >     
> >     
> >             
> > _______________________________________________________ 
> > Yahoo! Acesso Grtis - Internet rpida e grtis. 
> > Instale o discador agora! http://br.acesso.yahoo.com/
> > 
> > 
> > 
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> > _______________________________________________
> > enlightenment-devel mailing list
> > enlightenment-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
diff -r -u e17/libs/imlib2/src/lib/amd64_blend.S e17_ori/e17/libs/imlib2/src/lib/amd64_blend.S
--- e17/libs/imlib2/src/lib/amd64_blend.S	2005-08-26 20:06:33.000000000 +0000
+++ e17_ori/e17/libs/imlib2/src/lib/amd64_blend.S	2005-08-22 11:06:35.000000000 +0000
@@ -1264,7 +1264,7 @@
 PR_(imlib_amd64_copy_rgb_to_rgba):
 	ENTER
 
-	movdqu mX000X000X000X000(%rip), %xmm5
+	movdqa mX000X000X000X000(%rip), %xmm5
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -1395,7 +1395,7 @@
 	ENTER
 
 	pxor %xmm4, %xmm4
-	movdqu m00XXXXXX(%rip), %xmm6
+	movdqa m00XXXXXX(%rip), %xmm6
 
 	/* Move right to left across each line, */ 
 	/* processing in two pixel chunks */ 
@@ -1774,9 +1774,9 @@
 	ENTER
 
 	pxor %xmm4, %xmm4
-	movdqu c1(%rip), %xmm5
+	movdqa c1(%rip), %xmm5
 	xorq %rax, %rax
-	movdqu mX000X000X000X000(%rip), %xmm6
+	movdqa mX000X000X000X000(%rip), %xmm6
 	movq [EMAIL PROTECTED](%rip), %r13
 
 	/* Move right to left across each line, */ 
@@ -2213,7 +2213,7 @@
 PR_(imlib_amd64_add_copy_rgba_to_rgb):
 	ENTER
 
-	movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5
+	movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -2364,7 +2364,7 @@
 PR_(imlib_amd64_add_copy_rgba_to_rgba):
 	ENTER
 
-	movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5
+	movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -2515,7 +2515,7 @@
 PR_(imlib_amd64_add_copy_rgb_to_rgba):
 	ENTER
 
-	movdqu mX000X000X000X000(%rip), %xmm5
+	movdqa mX000X000X000X000(%rip), %xmm5
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -2667,7 +2667,7 @@
 	ENTER
 
 	pxor %xmm4, %xmm4
-	movdqu m00XXXXXX(%rip), %xmm6
+	movdqa m00XXXXXX(%rip), %xmm6
 
 	/* Move right to left across each line, */ 
 	/* processing in two pixel chunks */ 
@@ -3047,9 +3047,9 @@
 
 	movq [EMAIL PROTECTED](%rip), %r13
 	pxor %xmm4, %xmm4
-	movdqu c1(%rip), %xmm5
-	movdqu mX000X000X000X000(%rip), %xmm6
-	movdqu mX000X000(%rip), %xmm7
+	movdqa c1(%rip), %xmm5
+	movdqa mX000X000X000X000(%rip), %xmm6
+	movdqa mX000X000(%rip), %xmm7
 	xorq %rax, %rax
 
 	/* Move right to left across each line, */ 
@@ -3495,7 +3495,7 @@
 PR_(imlib_amd64_subtract_copy_rgba_to_rgb):
 	ENTER
 
-	movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5
+	movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -3646,8 +3646,8 @@
 PR_(imlib_amd64_subtract_copy_rgba_to_rgba):
 	ENTER
 
-	movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5
-	movdqu mX000X000X000X000(%rip), %xmm6
+	movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5
+	movdqa mX000X000X000X000(%rip), %xmm6
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -3818,7 +3818,7 @@
 PR_(imlib_amd64_subtract_copy_rgb_to_rgba):
 	ENTER
 
-	movdqu mX000X000X000X000(%rip), %xmm5
+	movdqa mX000X000X000X000(%rip), %xmm5
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
@@ -3970,8 +3970,8 @@
 	ENTER
 
 	pxor %xmm4, %xmm4
-	movdqu m000V0V0V000V0V0V(%rip), %xmm6
-	movdqu m00XXXXXX(%rip), %xmm7
+	movdqa m000V0V0V000V0V0V(%rip), %xmm6
+	movdqa m00XXXXXX(%rip), %xmm7
 
 	/* Move right to left across each line, */ 
 	/* processing in two pixel chunks */ 
@@ -4288,10 +4288,10 @@
 
 	movq [EMAIL PROTECTED](%rip), %r13
 	pxor %xmm4, %xmm4
-	movdqu c1(%rip), %xmm5
-	movdqu mX000X000X000X000(%rip), %xmm6
-	movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm7
-	movdqu m000V0V0V000V0V0V(%rip), %xmm8
+	movdqa c1(%rip), %xmm5
+	movdqa mX000X000X000X000(%rip), %xmm6
+	movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm7
+	movdqa m000V0V0V000V0V0V(%rip), %xmm8
 	xorq %rax, %rax
 
 	/* Move right to left across each line, */ 
@@ -4682,8 +4682,8 @@
 PR_(imlib_amd64_reshade_copy_rgba_to_rgb):
 	ENTER
 
-	movdqu m0XXX0XXX0XXX0XXX(%rip), %xmm5
-	movdqu m0VVV0VVV0VVV0VVV(%rip), %xmm6
+	movdqa m0XXX0XXX0XXX0XXX(%rip), %xmm5
+	movdqa m0VVV0VVV0VVV0VVV(%rip), %xmm6
 
 	leaq (%rsi, %r8, 4), %rsi
 	leaq (%rdi, %r8, 4), %rdi
Only in e17/libs/imlib2/src/lib: amd64_blend.S~

Reply via email to