Vince Weaver wrote:
> Hello
>
> The attached patch implements the x87 fsincos and fnstsw instructios on 
> x86.
>
> These are actually the only unimplemented x86 instructions used by -msse3 
> compiled benchmarks.
>
> Because of that, I created this hackish patch, which lets the benchmarks 
> run.
>
> With this and my other patches, all of spec2k now run.  I won't know if 
> all of them finish correctly for a few days yet though.
>
> The correct thing to do regarding this patch would be to fully implement 
> x87 using proper micro-ops.  I know that was discussed on this list 
> before.
>
> Note in the patch below, fnstsw isn't implemented right.  I couldn't 
> figure out what register constraints equaled "AX or a 16-bit memory 
> location" so I cheated because I knew the benchmarks always used the AX 
> form.  It's important to set AX to zero on such a call, otherwise the 
> benchmarks that use fsincos will loop forever dividing by 2pi waiting for 
> a no-overflow condition (and I don't think we emulate the status word 
> correctly).
>
>
>
> diff -r fb4a3a61bc74 src/arch/x86/isa/decoder/x87.isa
> --- a/src/arch/x86/isa/decoder/x87.isa        Wed Nov 04 13:22:15 2009 -0500
> +++ b/src/arch/x86/isa/decoder/x87.isa        Fri Nov 06 16:24:44 2009 -0500
> @@ -124,7 +124,7 @@
>                      0x0: fprem();
>                      0x1: fyl2xp1();
>                      0x2: fsqrt();
> -                    0x3: fsincos();
> +                    0x3: Inst::FSINCOS();
>                      0x4: frndint();
>                      0x5: fscale();
>                      0x6: fsin();
> @@ -265,7 +265,7 @@
>              }
>              0x7: decode MODRM_MOD {
>                  0x3: Inst::UD2();
> -                default: fnstsw();
> +                default: Inst::FNSTSW();
>              }
>          }
>          //0x6: esc6();
> @@ -326,7 +326,7 @@
>              }
>              0x4: decode MODRM_MOD {
>                  0x3: decode MODRM_RM {
> -                    0x0: fnstsw();
> +                    0x0: Inst::FNSTSW();
>                      default: Inst::UD2();
>                  }
>                  default: fbld();
>
> diff -r fb4a3a61bc74 
> src/arch/x86/isa/insts/x87/control/save_x87_status_word.py
> --- a/src/arch/x86/isa/insts/x87/control/save_x87_status_word.py      Wed Nov 
> 04 13:22:15 2009 -0500
> +++ b/src/arch/x86/isa/insts/x87/control/save_x87_status_word.py      Fri Nov 
> 06 16:24:44 2009 -0500
> @@ -55,5 +55,10 @@
>  
>  microcode = '''
>  # FSTSW
> -# FNSTSW
> +
> +def macroop FNSTSW {
> +    limm t2,0
> +    mov rax, rax, t2, dataSize=2
> +};
> +
>  '''
>   

You should identify which of the two places fnstsw shows up uses memory
and which use AX since they have different opcodes. The combination of
letters that go in the parenthesis indicate what type of operand to use,
and are intended to be similar to what's in the AMD manual volume 3 at
the start of appendix A (and maybe the Intel manual somewhere). The
memory case, which I believe is the top one in your patch, would use
"M". Since it's a 16 bit value in memory, you'd use the "w" suffix. The
other can specify AX by using the rA, little r for a fixed register and
A to say which one, and the "w" suffix. Put together, they would be
Inst::FNSTSW(Mw) and Inst::FNSTSW(rAw).

If there's some static setting of the status word that makes life
easier, we could just install that at process start up for now. The ABI
seems to specify what a program can expect, but I don't think that
actually gets installed anywhere. If it's something that's expected to
change, I would strongly prefer implementing the status word correctly
instead of a hack that makes this case work but might trip somebody up
silently in the future. At minimum, we should throw in a "warn" microop
(specified in arch/x86/isa/microops/debug.isa) so that people know
something suspicious is happening.

> diff -r fb4a3a61bc74 
> src/arch/x86/isa/insts/x87/transcendental_functions/trigonometric_functions.py
> --- 
> a/src/arch/x86/isa/insts/x87/transcendental_functions/trigonometric_functions.py
>   Wed Nov 04 13:22:15 2009 -0500
> +++ 
> b/src/arch/x86/isa/insts/x87/transcendental_functions/trigonometric_functions.py
>   Fri Nov 06 16:24:44 2009 -0500
> @@ -56,7 +56,14 @@
>  microcode = '''
>  # FSIN
>  # FCOS
> -# FSINCOS
> +
> +def macroop FSINCOS {
> +    sinfp ufp1, st(0)
> +    cosfp ufp2, st(0)
> +    movfp st(0), ufp1
> +    movfp st(-1), ufp2, spm=-1
> +};
> +
>  # FPTAN
>  # FPATAN
>   

Is it necessary to use the intermediate microfp registers here? If you
write to st(-1) first, you don't have to worry about overwriting
anything. You may have to do something like this in cases where an
instruction may fault half way through. I looked into that a bit just
now, but it wasn't clear to me what the expected semantics are or how to
handle them correctly. X87 is apparently -almost- as confusing as X86
:P. X87 exceptions are not modeled at all as far as I remember, so it
wouldn't be a big problem to ignore those cases for now. If you're
intending to really fill out x87 (we'd really appreciate that, by the
way) you'll want to dig into those things more.

>  '''
> diff -r fb4a3a61bc74 src/arch/x86/isa/microops/fpop.isa
> --- a/src/arch/x86/isa/microops/fpop.isa      Wed Nov 04 13:22:15 2009 -0500
> +++ b/src/arch/x86/isa/microops/fpop.isa      Fri Nov 06 16:24:44 2009 -0500
> @@ -270,6 +270,21 @@
>      class Sqrtfp(FpOp):
>          code = 'FpDestReg = sqrt(FpSrcReg2);'
>  
> +    class Cosfp(FpOp):
> +        def __init__(self, dest, src1, spm=0, \
> +                SetStatus=False, dataSize="env.dataSize"):
> +            super(Cosfp, self).__init__(dest, src1, "InstRegIndex(0)", \
> +                    spm, SetStatus, dataSize)
> +        code = 'FpDestReg = cos(FpSrcReg1);'
> +
> +    class Sinfp(FpOp):
> +        def __init__(self, dest, src1, spm=0, \
> +                SetStatus=False, dataSize="env.dataSize"):
> +            super(Sinfp, self).__init__(dest, src1, "InstRegIndex(0)", \
> +                    spm, SetStatus, dataSize)
> +        code = 'FpDestReg = sin(FpSrcReg1);'
> +
> +
>      # Conversion microops
>      class ConvOp(FpOp):
>          abstract = True
>   

These look good. Do we want microops that do such complicated
operations? They're arguably not very important performance wise and
would be obnoxious to fully build out, but it seems really inaccurate
this way. I'm not apposed to either way, but we should come up with a
strategy that makes sense. Maybe we did already and I'm just behind the
times.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to