Re: [gem5-dev] x86 FPU stack handling regression

Andreas Sandberg Tue, 04 Feb 2014 01:46:29 -0800

Steve,

Thanks for looking into this.

Hmm, it seems to make sense that NumFloatArchRegs would be larger thanNumFloatRegs on x86 because of the way MMX and x87 registers are mapped.To make things even more "interesting", we are using the theNUM_FLOATREGS constant in the microassembler (~old NumFloatRegs).

It seems like the problem stems from the way we handle the FP stack inthe microassembler. Stack-relative registers are mapped to a set ofvirtual registers after the normal FP registers, these have indices inthe range [NUM_FLOATREGS, NUM_FLOATREGS+8). Normally, these registersare read through the ThhreadContext::setFloatReg() interface which meansthat the index is flattened and stack-relative registers are mapped intoactual registers (X86ISA::flattenFloatIndex()). The problem here isX86ISA::copyRegs(), where we used setFloatRegBits() instead ofsetFloatRegBitsFlat(). This causes the x87 stack to be copied twice andsince the MISCREG_X87_TOP is undefined (likely 0) in the destination TC,the second copy goes to the wrong location on the stack.


I'll post a patch shortly.

//Andreas


On 2014-02-04 00:01, Steve Reinhardt wrote:

Hi Andreas,

I won't claim to having more knowledge of the dark arts here... I think the
issue is that that knowledge is lost in the mists of time and can only be
reconstructed by studying the ancient runes.

Looking at the changeset you point out, I think the confusion I had was
based on the distinction between NumFloatRegs and NumFloatArchRegs.
  Intuitively the former should be the number of actual registers (state),
while the latter should be the number of architecturally visible registers.
  Thus you would expect NumFloatArchRegs <= NumFloatRegs, and in fact that's
the case for all the other ISAs where NumFloatArchRegs is defined: for
Power and MIPS we have:
   NumFloatRegs = NumFloatArchRegs + NumFloatSpecialRegs
while for Alpha and SPARC we have:
   NumFloatRegs = NumFloatArchRegs
(though for Power, NumFloatSpecialRegs = 0, so it could be written the
latter way as well).  However, in x86, we had:
   NumFloatArchRegs = NumFloatRegs + 8
which just seemed backwards.

In addition, we typically have the case where Ctrl_Reg_Base = FP_Reg_Base +
NumFloatRegs, which wasn't true, because instead we had  trl_Reg_Base =
FP_Reg_Base + NumFloatArchRegs (effectively, even though the value was
calculated separately).

Compounding the confusion is the fact that NumFloatArchRegs is not always
defined and very rarely used (and not used at all for x86).

So I "simplified" things by getting rid of NumFloatArchRegs and redefining
NumFloatRegs to be the old value of NumFloatArchRegs (which was 8 larger
than its previous value).  All the x86 regressions at the time passed, so I
assumed that I had not broken anything in doing so.

Looking at the changeset again, I see that the quantity 8 that was included
in the old Ctrl_Base_DepTag calculation that did not seem to be included in
NumFloatRegs is associated with the comment "The indices that are mapped
over the FP stack".  So in retrospect, if we have 8 additional
architectural names for existing architectural registers (effectively
architectural aliases for existing FP reg state), then I suppose it does
make sense that NumFloatArchRegs = NumFloatRegs + 8.

It's not clear where we are using NumFloatRegs that having too large of a
value matters; it seems to me that having 8 extra regs of state shouldn't
break things.  Obviously that's not the case though.

I suggest taking that last '+ 8' off the computation of NumFloatRegs, then
adding it back in in the enum, i.e.,
CC_Reg_Base = FP_Reg_Base + NumFloatRegs + 8,
and see if that makes your regression pass.  I hope so.  If so, then we'll
have to decide whether we want to do anything fancier, like reintroduce
NumFloatArchRegs, or just use that code and add in some comments to explain
the situation.

Thanks,

Steve



On Mon, Feb 3, 2014 at 4:16 AM, Andreas Sandberg <[email protected]>wrote:

Hi Everyone,

I was just testing some x87 stuff on a new gem5 version and it seems like
we have a pretty nasty regression. In recent versions of the code base, it
seems like the stack isn't handled correctly. I was able to bisect down to
the offending commit (7274310be1bb, isa: clean up register constants),
which changes ISA constants on Alpha, Mips, SPARC, and x86.

As far as I can tell, the NumFloatRegs constant is increased by 8, but
Ctrl_Base_DepTag remains constant. I'm not sure how any of this affects the
stack handling though. Could someone with more knowledge of the dark arts
of x86 CPU simulation have a look? Steve?


The test case I used was the following code:
   fninit
   fldcw fctl_extended
   fld1
   fldpi
   fldpi
...
   fctl_extended:
           .word 0x037f



These are the results I expect to get (and do get on KVM):
info: FPU registers (XSave):
info:   fcw: 0x37f
info:   fsw: 0x2800 (top: 5, conditions: , exceptions:  )
info:   ftwx: 0xe0
info:   FP Stack:
info:           ST0/5: 0x35c26821a2da0fc90040 (3.14159)
info:           ST1/6: 0x35c26821a2da0fc90040 (3.14159)
info:           ST2/7: 0x0000000000000080ff3f (1)

On the Atomic CPU I get the following:
info: FPU registers (XSave):
info:   fcw: 0x37f
info:   fsw: 0x2800 (top: 5, conditions: , exceptions:  )
info:   ftwx: 0xe0
info:   FP Stack:
info:           ST0/5: 0x00000000000000000000 (0)
info:           ST1/6: 0x00000000000000000000 (0)
info:           ST2/7: 0x00000000000000000000 (0)
info:           ST3/0: 0x00507721a2da0fc90040 (3.14159) (e)
info:           ST4/1: 0x00507721a2da0fc90040 (3.14159) (e)
info:           ST5/2: 0x0000000000000080ff3f (1) (e)

//Andreas

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] x86 FPU stack handling regression

Reply via email to