On Sun, 13 Oct 2002 09:34:17 +0100
Jos� Fonseca <[EMAIL PROTECTED]> wrote:
> Felix,
>
> On Sun, Oct 13, 2002 at 02:47:18AM +0200, Felix K�hling wrote:
> >More results:
> >
> >I get the same strange behaviour when I compile with -march=pentium2. I
> >could narrow it down to lib/GL/mesa/src/drv/radeon/radeon_state.c. Then
> >I split radeon_state.c and put only one non-static function at a time in
> >a new radeon_state2.c along with the necessary static functions. Only
> >radeon_state2 was compiled with -march=athlon.
> >
> >I could reproduce the error with only radeonUpdateScissor and
> >radeonUpdateViewportOffset in radeon_state2.c. On the other hand I don't
> >get the error if I compile everything with -march=athlon except
> >radeon_state2.c which means that I've neatly isolated the problem :)
> >
>
> Nice detective work! ;-)
>
> >I had gcc generate assembler output for radeon_state2.c with
> >-mcpu=athlon and -march=athlon. A diff of the two assembler files and
> >radeon_state2.c (for comparing line numbers) are attached. The version
> >compiled with -march=athlon uses %mm0, the other one doesn't. My guess
> >is that some other part of the radeon driver or Mesa makes assumptions
> >about the MMX state which are not true when compiling with
> >-march=athlon.
>
> I don't have the Athlon processor documentation with me, but when using
> MMX there are two things that _usually_ must be done:
> - never mix MMX with FP code, since they both use the same internal
> registers
> - for the reason above, one has to call the EMMS instruction before
> making FP calculations after using MMS
>
> Both of these rules aren't being followed below. I know that newer
> processors don't have some of these limitations (I know at least that
> calling EMMS doesn't take so long as in the original Pentium MMX) but
> only once I refer to the specs I can be sure that gcc isn't generating
> wrong code here.
I read the IA-32 specs and they are quite clear about mixing MMX with
FPU code. Any MMX instruction except EMMS clobbers the entire FPU state
by reseting the TOS and setting the TAG word to 0 thus marking all FPU
registers "valid". So the generated code cannot be correct here. I
reported this to the Debian Bug Tracking System since Debian's gcc-3.2
is a prerelease. The maintainer shall forward it to GNU if appropriate.
Appart from that, I think that radeonUpdateViewportOffset is buggy. The
aliasing of int and float is inconsistent. The comparison is done
without aliasing, in the subsequent assignment the floats are aliased to
ints. I attached a patch that is a workaround for the gcc-3.2 problem at
the same time. I guess it's the inconsistent aliasing that confused gcc.
> Another issue is that when using %mm0 it's moving 64 bits back and
> forward, but on the pentium2 code, it only moves 32 bits. I'm not sure if
> the code semantics allow this - it would be necessary to match the
> assembly with the C code to actually see if this is OK.
>
> >
> >I tried disabling MesaUse3DNow and MesaUseMMX in host.def, but that
> >didn't help.
> [...]
>
> If above is true, then it doesn't depend of MMX being used elsewhere to
> cause the bug.
>
> Jos� Fonseca
>
>
> >@@ -212,25 +212,23 @@
> > .loc 1 93 0
> > movl 216(%ebx), %edx
> > .loc 1 91 0
> >- movl -24(%ebp), %esi
> >+ movd -24(%ebp), %mm0
> > .loc 1 93 0
> > fildl 8(%edx)
> > movl %ecx, -24(%ebp)
> > flds -24(%ebp)
> > fxch %st(1)
> >- fucompp
> >- fnstsw %ax
> >- andb $69, %ah
> >- xorb $64, %ah
> >+ fucomip %st(1), %st
> >+ fstp %st(0)
> > jne .L14
> >+ jp .L14
> > fildl 16(%edx)
> >- movl %esi, -24(%ebp)
> >+ movd %mm0, -24(%ebp)
> > flds -24(%ebp)
> > fxch %st(1)
> >- fucompp
> >- fnstsw %ax
> >- andb $69, %ah
> >- cmpb $64, %ah
> >+ fucomip %st(1), %st
> >+ fstp %st(0)
> >+ jp .L14
> > je .L13
> > .L14:
> > .loc 1 105 0
> >@@ -240,7 +238,7 @@
> > .LBE6:
> > movl %ecx, 8(%edx)
> > .loc 1 100 0
> >- movl %esi, 16(%edx)
> >+ movd %mm0, 16(%edx)
> > .loc 1 111 0
> > .LBB7:
> > movl $31, %edx
>
>
Regards,
Felix
__\|/__ ___ ___ ___
__Tsch��_______\_6 6_/___/__ \___/__ \___/___\___You can do anything,___
_____Felix_______\�/\ \_____\ \_____\ \______U___just not everything____
[EMAIL PROTECTED] >o<__/ \___/ \___/ at the same time!
Index: radeon_state.c
===================================================================
RCS file: /cvsroot/dri/xc/xc/lib/GL/mesa/src/drv/radeon/radeon_state.c,v
retrieving revision 1.18
diff -u -r1.18 radeon_state.c
--- radeon_state.c 25 Aug 2002 22:24:39 -0000 1.18
+++ radeon_state.c 13 Oct 2002 17:41:23 -0000
@@ -1414,8 +1414,8 @@
GLfloat tx = v[MAT_TX] + xoffset;
GLfloat ty = (- v[MAT_TY]) + yoffset;
- if ( rmesa->hw.vpt.cmd[VPT_SE_VPORT_XOFFSET] != tx ||
- rmesa->hw.vpt.cmd[VPT_SE_VPORT_YOFFSET] != ty )
+ if ( rmesa->hw.vpt.cmd[VPT_SE_VPORT_XOFFSET] != *(GLuint *)&tx ||
+ rmesa->hw.vpt.cmd[VPT_SE_VPORT_YOFFSET] != *(GLuint *)&ty )
{
/* Note: this should also modify whatever data the context reset
* code uses...