Re: [gem5-dev] O3 FS and Ruby

Nilay Vaish Fri, 30 Dec 2011 13:16:58 -0800

The following patch takes care of the current problem. I am able toreplicate the O3 full system regression test with Ruby in place of theclassic memory system.


diff --git a/src/cpu/o3/commit.hh b/src/cpu/o3/commit.hh
--- a/src/cpu/o3/commit.hh
+++ b/src/cpu/o3/commit.hh
@@ -443,6 +443,9 @@
     /** Rename map interface. */
     RenameMap *renameMap[Impl::MaxThreads];

+ /* Variable for keeping track if an instruction is in middle ofexecution */

+    bool instruction_in_flight;
+
     /** Updates commit stats based on this instruction. */
     void updateComInstStats(DynInstPtr &inst);

diff --git a/src/cpu/o3/commit_impl.hh b/src/cpu/o3/commit_impl.hh
--- a/src/cpu/o3/commit_impl.hh
+++ b/src/cpu/o3/commit_impl.hh
@@ -105,7 +105,8 @@
       numThreads(params->numThreads),
       drainPending(false),
       switchedOut(false),
-      trapLatency(params->trapLatency)
+      trapLatency(params->trapLatency),
+      instruction_in_flight(false)
 {
     _status = Active;
     _nextStatus = Inactive;
@@ -717,7 +718,7 @@

// Wait until all in flight instructions are finished beforeenterring

     // the interrupt.
-    if (cpu->instList.empty()) {
+    if (cpu->instList.empty() && (!instruction_in_flight)) {
         // Squash or record that I need to squash this cycle if
         // an interrupt needed to be handled.
         DPRINTF(Commit, "Interrupt detected.\n");
@@ -990,6 +991,8 @@
                     cpu->instDone(tid);
                 }

+                instruction_in_flight = !head_inst->isLastMicroop();
+
                 // Updates misc. registers.
                 head_inst->updateMiscRegs();

--
Nilay

On Fri, 30 Dec 2011, Nilay Vaish wrote:

Gabe, from all the traces that I have been through, I never found yourexplanation to be true. Here is an excerpt from your first email in thatthread --
----
It turns out the problem is that an instruction is started which returns fromkernel to user level and is microcoded. The instruction is fetched from thekernel's address space successfully and starts to execute, along the waydropping down to user mode. Some microops later, there's some microop controlflow which O3 mispredicts. When it squashes the mispredict and tries torestart, it first tries to refetch the instruction involved. Since it's nowat user level and the instruction is on a kernel level only page, there's apage fault and things go downhill from there.
----
You claim that because of branch misprediction, the instruction needs to berefetched. I never saw that happening. In all the cases your fix, in whichthe fetch stage picks the current macroop from the just now squashedinstruction, worked as expected.
What I saw was that an isSquashAfter microop squashes everything and thismakes the O3 CPU start on handling interrupt. Returning from the interrupt,it faults since the CS register had been overwritten by the sysretinstruction, but not the instruction pointer.
Again, I think we need to change the behavior of isSquashAfter.

--
Nilay

On Fri, 30 Dec 2011, Korey Sewell wrote:
I can't vouch for reading all the emails but I have gone through this whole
thread (which dates back to Nov. 29th).

Also, I'm not all the way familiar with x86 so maybe this excludes me from
understanding the problem at the detailed level, but I think I am starting
to get a good grasp of the general squashing problem here (basically
maintaining squash state through exception events).

My concern is that if you don't literally "fix" the problem first, you can
get caught up in the minutia of making this big grand sweeping change and
then have no good way to say if "the fix" fixes anything in the firstplace.
If Nilay or anyone could get something to the reviewboard that worked, hack
or not, then that would be a good step toward making the "clean" change
that I think you're referring to Gabe. We dont have to commit the code, but
on a 1st pass working is better then "not working", right? :)

(Gabe, I do understand it can be frustrating explaining the same things
over/over again.)

On Fri, Dec 30, 2011 at 3:48 AM, Gabe Black <[email protected]> wrote:
If you read my emails the problem would already be identified and
understood, because I did that weeks or even months ago and explained it
multiple times. A hack fix is not ok. A hack fix is why this is still
broken in the first place. That's also something I explained in my emails.

Gabe

On 12/30/11 02:50, Korey Sewell wrote:
I agree with you Gabe that the squashing mechanism could be cleaned up.

But I'd also suggest that Nilay should understand/identify the problem
first and then implement a first-pass fix without a big squashing revamp
(if possible).

That way, when we (nilay, you, me, whoever) gets to revamping the squash
code, there is at least a set test case and trace we can use to debug
with..
On Fri, Dec 30, 2011 at 2:30 AM, Gabe Black <[email protected]>
wrote:
What was unclear about this email and the ones before it? Did you not
believe me for some reason? You've spent about a month partially
rediscovering what I explained in them. I've already said how this needs
to be fixed.

Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
--
- Korey
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] O3 FS and Ruby

Reply via email to