Steve, Thanks for looking into this.
Hmm, it seems to make sense that NumFloatArchRegs would be larger than NumFloatRegs on x86 because of the way MMX and x87 registers are mapped. To make things even more "interesting", we are using the the NUM_FLOATREGS constant in the microassembler (~old NumFloatRegs).
It seems like the problem stems from the way we handle the FP stack in the microassembler. Stack-relative registers are mapped to a set of virtual registers after the normal FP registers, these have indices in the range [NUM_FLOATREGS, NUM_FLOATREGS+8). Normally, these registers are read through the ThhreadContext::setFloatReg() interface which means that the index is flattened and stack-relative registers are mapped into actual registers (X86ISA::flattenFloatIndex()). The problem here is X86ISA::copyRegs(), where we used setFloatRegBits() instead of setFloatRegBitsFlat(). This causes the x87 stack to be copied twice and since the MISCREG_X87_TOP is undefined (likely 0) in the destination TC, the second copy goes to the wrong location on the stack.
I'll post a patch shortly. //Andreas On 2014-02-04 00:01, Steve Reinhardt wrote:
Hi Andreas, I won't claim to having more knowledge of the dark arts here... I think the issue is that that knowledge is lost in the mists of time and can only be reconstructed by studying the ancient runes. Looking at the changeset you point out, I think the confusion I had was based on the distinction between NumFloatRegs and NumFloatArchRegs. Intuitively the former should be the number of actual registers (state), while the latter should be the number of architecturally visible registers. Thus you would expect NumFloatArchRegs <= NumFloatRegs, and in fact that's the case for all the other ISAs where NumFloatArchRegs is defined: for Power and MIPS we have: NumFloatRegs = NumFloatArchRegs + NumFloatSpecialRegs while for Alpha and SPARC we have: NumFloatRegs = NumFloatArchRegs (though for Power, NumFloatSpecialRegs = 0, so it could be written the latter way as well). However, in x86, we had: NumFloatArchRegs = NumFloatRegs + 8 which just seemed backwards. In addition, we typically have the case where Ctrl_Reg_Base = FP_Reg_Base + NumFloatRegs, which wasn't true, because instead we had trl_Reg_Base = FP_Reg_Base + NumFloatArchRegs (effectively, even though the value was calculated separately). Compounding the confusion is the fact that NumFloatArchRegs is not always defined and very rarely used (and not used at all for x86). So I "simplified" things by getting rid of NumFloatArchRegs and redefining NumFloatRegs to be the old value of NumFloatArchRegs (which was 8 larger than its previous value). All the x86 regressions at the time passed, so I assumed that I had not broken anything in doing so. Looking at the changeset again, I see that the quantity 8 that was included in the old Ctrl_Base_DepTag calculation that did not seem to be included in NumFloatRegs is associated with the comment "The indices that are mapped over the FP stack". So in retrospect, if we have 8 additional architectural names for existing architectural registers (effectively architectural aliases for existing FP reg state), then I suppose it does make sense that NumFloatArchRegs = NumFloatRegs + 8. It's not clear where we are using NumFloatRegs that having too large of a value matters; it seems to me that having 8 extra regs of state shouldn't break things. Obviously that's not the case though. I suggest taking that last '+ 8' off the computation of NumFloatRegs, then adding it back in in the enum, i.e., CC_Reg_Base = FP_Reg_Base + NumFloatRegs + 8, and see if that makes your regression pass. I hope so. If so, then we'll have to decide whether we want to do anything fancier, like reintroduce NumFloatArchRegs, or just use that code and add in some comments to explain the situation. Thanks, Steve On Mon, Feb 3, 2014 at 4:16 AM, Andreas Sandberg <[email protected]>wrote:Hi Everyone, I was just testing some x87 stuff on a new gem5 version and it seems like we have a pretty nasty regression. In recent versions of the code base, it seems like the stack isn't handled correctly. I was able to bisect down to the offending commit (7274310be1bb, isa: clean up register constants), which changes ISA constants on Alpha, Mips, SPARC, and x86. As far as I can tell, the NumFloatRegs constant is increased by 8, but Ctrl_Base_DepTag remains constant. I'm not sure how any of this affects the stack handling though. Could someone with more knowledge of the dark arts of x86 CPU simulation have a look? Steve? The test case I used was the following code: fninit fldcw fctl_extended fld1 fldpi fldpi ... fctl_extended: .word 0x037f These are the results I expect to get (and do get on KVM): info: FPU registers (XSave): info: fcw: 0x37f info: fsw: 0x2800 (top: 5, conditions: , exceptions: ) info: ftwx: 0xe0 info: FP Stack: info: ST0/5: 0x35c26821a2da0fc90040 (3.14159) info: ST1/6: 0x35c26821a2da0fc90040 (3.14159) info: ST2/7: 0x0000000000000080ff3f (1) On the Atomic CPU I get the following: info: FPU registers (XSave): info: fcw: 0x37f info: fsw: 0x2800 (top: 5, conditions: , exceptions: ) info: ftwx: 0xe0 info: FP Stack: info: ST0/5: 0x00000000000000000000 (0) info: ST1/6: 0x00000000000000000000 (0) info: ST2/7: 0x00000000000000000000 (0) info: ST3/0: 0x00507721a2da0fc90040 (3.14159) (e) info: ST4/1: 0x00507721a2da0fc90040 (3.14159) (e) info: ST5/2: 0x0000000000000080ff3f (1) (e) //Andreas _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev_______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
