The following patch takes care of the current problem. I am able to
replicate the O3 full system regression test with Ruby in place of the
classic memory system.
diff --git a/src/cpu/o3/commit.hh b/src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh
+++ b/src/cpu/o3/commit.hh
@@ -443,6 +443,9 @@
/** Rename map interface. */
RenameMap *renameMap[Impl::MaxThreads];
+ /* Variable for keeping track if an instruction is in middle of
execution */
+ bool instruction_in_flight;
+
/** Updates commit stats based on this instruction. */
void updateComInstStats(DynInstPtr &inst);
diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh
+++ b/src/cpu/o3/commit_impl.hh
@@ -105,7 +105,8 @@
numThreads(params->numThreads),
drainPending(false),
switchedOut(false),
- trapLatency(params->trapLatency)
+ trapLatency(params->trapLatency),
+ instruction_in_flight(false)
{
_status = Active;
_nextStatus = Inactive;
@@ -717,7 +718,7 @@
// Wait until all in flight instructions are finished before
enterring
// the interrupt.
- if (cpu->instList.empty()) {
+ if (cpu->instList.empty() && (!instruction_in_flight)) {
// Squash or record that I need to squash this cycle if
// an interrupt needed to be handled.
DPRINTF(Commit, "Interrupt detected.\n");
@@ -990,6 +991,8 @@
cpu->instDone(tid);
}
+ instruction_in_flight = !head_inst->isLastMicroop();
+
// Updates misc. registers.
head_inst->updateMiscRegs();
--
Nilay
On Fri, 30 Dec 2011, Nilay Vaish wrote:
Gabe, from all the traces that I have been through, I never found your
explanation to be true. Here is an excerpt from your first email in that
thread --
----
It turns out the problem is that an instruction is started which returns from
kernel to user level and is microcoded. The instruction is fetched from the
kernel's address space successfully and starts to execute, along the way
dropping down to user mode. Some microops later, there's some microop control
flow which O3 mispredicts. When it squashes the mispredict and tries to
restart, it first tries to refetch the instruction involved. Since it's now
at user level and the instruction is on a kernel level only page, there's a
page fault and things go downhill from there.
----
You claim that because of branch misprediction, the instruction needs to be
refetched. I never saw that happening. In all the cases your fix, in which
the fetch stage picks the current macroop from the just now squashed
instruction, worked as expected.
What I saw was that an isSquashAfter microop squashes everything and this
makes the O3 CPU start on handling interrupt. Returning from the interrupt,
it faults since the CS register had been overwritten by the sysret
instruction, but not the instruction pointer.
Again, I think we need to change the behavior of isSquashAfter.
--
Nilay
On Fri, 30 Dec 2011, Korey Sewell wrote:
I can't vouch for reading all the emails but I have gone through this whole
thread (which dates back to Nov. 29th).
Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).
My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the first
place.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)
(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)
On Fri, Dec 30, 2011 at 3:48 AM, Gabe Black <[email protected]> wrote:
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.
Gabe
On 12/30/11 02:50, Korey Sewell wrote:
I agree with you Gabe that the squashing mechanism could be cleaned up.
But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).
That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
On Fri, Dec 30, 2011 at 2:30 AM, Gabe Black <[email protected]>
wrote:
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.
Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev