Hi Andreas,

I won't claim to having more knowledge of the dark arts here... I think the
issue is that that knowledge is lost in the mists of time and can only be
reconstructed by studying the ancient runes.

Looking at the changeset you point out, I think the confusion I had was
based on the distinction between NumFloatRegs and NumFloatArchRegs.
 Intuitively the former should be the number of actual registers (state),
while the latter should be the number of architecturally visible registers.
 Thus you would expect NumFloatArchRegs <= NumFloatRegs, and in fact that's
the case for all the other ISAs where NumFloatArchRegs is defined: for
Power and MIPS we have:
  NumFloatRegs = NumFloatArchRegs + NumFloatSpecialRegs
while for Alpha and SPARC we have:
  NumFloatRegs = NumFloatArchRegs
(though for Power, NumFloatSpecialRegs = 0, so it could be written the
latter way as well).  However, in x86, we had:
  NumFloatArchRegs = NumFloatRegs + 8
which just seemed backwards.

In addition, we typically have the case where Ctrl_Reg_Base = FP_Reg_Base +
NumFloatRegs, which wasn't true, because instead we had  trl_Reg_Base =
FP_Reg_Base + NumFloatArchRegs (effectively, even though the value was
calculated separately).

Compounding the confusion is the fact that NumFloatArchRegs is not always
defined and very rarely used (and not used at all for x86).

So I "simplified" things by getting rid of NumFloatArchRegs and redefining
NumFloatRegs to be the old value of NumFloatArchRegs (which was 8 larger
than its previous value).  All the x86 regressions at the time passed, so I
assumed that I had not broken anything in doing so.

Looking at the changeset again, I see that the quantity 8 that was included
in the old Ctrl_Base_DepTag calculation that did not seem to be included in
NumFloatRegs is associated with the comment "The indices that are mapped
over the FP stack".  So in retrospect, if we have 8 additional
architectural names for existing architectural registers (effectively
architectural aliases for existing FP reg state), then I suppose it does
make sense that NumFloatArchRegs = NumFloatRegs + 8.

It's not clear where we are using NumFloatRegs that having too large of a
value matters; it seems to me that having 8 extra regs of state shouldn't
break things.  Obviously that's not the case though.

I suggest taking that last '+ 8' off the computation of NumFloatRegs, then
adding it back in in the enum, i.e.,
CC_Reg_Base = FP_Reg_Base + NumFloatRegs + 8,
and see if that makes your regression pass.  I hope so.  If so, then we'll
have to decide whether we want to do anything fancier, like reintroduce
NumFloatArchRegs, or just use that code and add in some comments to explain
the situation.

Thanks,

Steve



On Mon, Feb 3, 2014 at 4:16 AM, Andreas Sandberg <[email protected]>wrote:

> Hi Everyone,
>
> I was just testing some x87 stuff on a new gem5 version and it seems like
> we have a pretty nasty regression. In recent versions of the code base, it
> seems like the stack isn't handled correctly. I was able to bisect down to
> the offending commit (7274310be1bb, isa: clean up register constants),
> which changes ISA constants on Alpha, Mips, SPARC, and x86.
>
> As far as I can tell, the NumFloatRegs constant is increased by 8, but
> Ctrl_Base_DepTag remains constant. I'm not sure how any of this affects the
> stack handling though. Could someone with more knowledge of the dark arts
> of x86 CPU simulation have a look? Steve?
>
>
> The test case I used was the following code:
>   fninit
>   fldcw fctl_extended
>   fld1
>   fldpi
>   fldpi
> ...
>   fctl_extended:
>           .word 0x037f
>
>
>
> These are the results I expect to get (and do get on KVM):
> info: FPU registers (XSave):
> info:   fcw: 0x37f
> info:   fsw: 0x2800 (top: 5, conditions: , exceptions:  )
> info:   ftwx: 0xe0
> info:   FP Stack:
> info:           ST0/5: 0x35c26821a2da0fc90040 (3.14159)
> info:           ST1/6: 0x35c26821a2da0fc90040 (3.14159)
> info:           ST2/7: 0x0000000000000080ff3f (1)
>
> On the Atomic CPU I get the following:
> info: FPU registers (XSave):
> info:   fcw: 0x37f
> info:   fsw: 0x2800 (top: 5, conditions: , exceptions:  )
> info:   ftwx: 0xe0
> info:   FP Stack:
> info:           ST0/5: 0x00000000000000000000 (0)
> info:           ST1/6: 0x00000000000000000000 (0)
> info:           ST2/7: 0x00000000000000000000 (0)
> info:           ST3/0: 0x00507721a2da0fc90040 (3.14159) (e)
> info:           ST4/1: 0x00507721a2da0fc90040 (3.14159) (e)
> info:           ST5/2: 0x0000000000000080ff3f (1) (e)
>
> //Andreas
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to