On 06/22/2013 05:33 AM, Nilay Vaish wrote:
On Wed, 19 Jun 2013, Andreas Sandberg wrote:
Hi Nilay,
Sorry for not replying earlier, I've been pretty busy fixing up bugs
in the x86 frontend to make SPEC CPU2006 actually work on x86. About
20% of the benchmarks in my setup execute unsupported instructions
and don't verify correctly (some even terminate long before they are
supposed to).
My take on the state of things at the moment is that there is
probably a bug if the second CPU is still executing in microcode and
no CPU in a drained system should ever be executing microcode. The
patch as it is now breaks all the assumptions we make about a drained
system, so I'm still opposed to committing it.
I did some investigations of my own and it seems like the second CPU
is held at pc 0xfffffff0.0 (i.e, not executing microcode early in the
boot). Could you try to find out why the second CPU in your system is
stuck in microcode? I suspect that the fact that your system is stuck
in microcode is actually a bug.
We should probably clean up the whole CPU state mess once and for all
at some point. I'm still not exactly sure, despite my implementing my
own CPU model, what is expected to happen in a CPU when the different
methods controlling state transitions are called. Not to mention
where the methods are called from and why. We really need to document
that.
So here is what happens. cpu0 sends an INIT interrupt to cpu1. The
code for the interrupt appears in the file src/arch/x86/faults.cc,
where the processor state is initialized and the micropc is set to
X86ISAInst::RomLabels::extern_label_initIntHalt. This label appears in
src/arch/x86/isa/insts/romutil.py. Following five instructions are
executed:
def rom
{
extern initIntHalt:
rflags t1
limm t2, "~IFBit"
and t1, t1, t2
wrflags t1, t0
halt
eret
};
The halt instruction suspends the thread context, and micro pc remains
something close to the address of the label.
Since you observed a different code path, my guess is that both of us
are using different versions of the Linux kernel, or you are using a
completely different os. In either case, I am now even more convinced
that a cpu in Idle state means that there is no PC/uPC attached to it.
So, you should not be bothered about what its value is.
I wasn't actually waiting for the system to boot completely, so I
probably never reached the point where the init interrupt was sent.
I'm pretty sure you are mistaken about the lack of a valid PC/uPC. The
Intel manual states: "If an
interrupt (including NMI) is used to resume execution after a HLT
instruction, the saved instruction pointer
(CS:EIP) points to the instruction following the HLT instruction."
So clearly, the PC is valid and will be used in all but special cases.
I'm pretty sure I've seen cases (for example idle loops) where the CPU
executes WFI instructions on ARM (and presumably HLT instructions on
x86) where the interrupt handler returns the the instruction after the
WFI/HLT. The case where the PC/uPC doesn't matter is just a special case.
I think the cleanest way of solving this would be to leave draining as
it is and introduce a "return and halt" microop that we use instead of
"halt; eret" in the init microcode. This would solve the deadlock
problem since the CPU wouldn't be executing microcode (the halt_eret
instruction would commit, but suspend execution before executing the
first instruction following the microcode sequence).
//Andreas
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev