Thinking back on the changes I put on reviewboard just recently, I believe they happen to fix this particular issue because the single predecoder is merged into the multiple decoders, forcing there to be a predecoder per thread. It makes a lot of sense to me to have an x86 SMT regression.

Gabe

Quoting Steve Reinhardt <[email protected]>:

I can't offer much detailed help on the problem, but I figured out how we
got here: the only SMT regression test we have uses the Alpha ISA, in which
I think the predecoder is stateless (or nearly so).  It's also a very
simple test (two copies of hello world).

Clearly we need to add an SMT x86 test to our regressions.

Steve

On Wed, May 16, 2012 at 2:11 PM, Michael Levenhagen <[email protected]>wrote:

I naively made the following change:

diff -r eaf80fe8a57e src/cpu/o3/fetch.hh
--- a/src/cpu/o3/fetch.hh   Wed May 09 18:16:19 2012 -0600
+++ b/src/cpu/o3/fetch.hh   Wed May 16 15:00:23 2012 -0600
@@ -447,7 +447,7 @@
    BPredUnit branchPred;

    /** Predecoder. */
-    TheISA::Predecoder predecoder;
+    TheISA::Predecoder predecoders[Impl::MaxThreads];

    TheISA::PCState pc[Impl::MaxThreads];

diff -r eaf80fe8a57e src/cpu/o3/fetch_impl.hh
--- a/src/cpu/o3/fetch_impl.hh  Wed May 09 18:16:19 2012 -0600
+++ b/src/cpu/o3/fetch_impl.hh  Wed May 16 15:00:23 2012 -0600
@@ -136,7 +136,6 @@
 DefaultFetch<Impl>::DefaultFetch(O3CPU *_cpu, DerivO3CPUParams *params)
    : cpu(_cpu),
      branchPred(params),
-      predecoder(NULL),
      numInst(0),
      decodeToFetchDelay(params->decodeToFetchDelay),
      renameToFetchDelay(params->renameToFetchDelay),
@@ -758,7 +757,7 @@
        macroop[tid] = squashInst->macroop;
    else
        macroop[tid] = NULL;
-    predecoder.reset();
+    predecoders[tid].reset();

    // Clear the icache miss if it's outstanding.
    if (fetchStatus[tid] == IcacheWaitResponse) {
@@ -1251,7 +1250,7 @@
        // We need to process more memory if we aren't going to get a
        // StaticInst from the rom, the current macroop, or what's already
        // in the predecoder.
-        bool needMem = !inRom && !curMacroop &&
!predecoder.extMachInstReady();
+        bool needMem = !inRom && !curMacroop &&
!predecoders[tid].extMachInstReady();

        if (needMem) {
            if (blkOffset >= numInsts) {
@@ -1272,10 +1271,10 @@
            }
            MachInst inst = TheISA::gtoh(cacheInsts[blkOffset]);

-            predecoder.setTC(cpu->thread[tid]->getTC());
-            predecoder.moreBytes(thisPC, fetchAddr, inst);
+            predecoders[tid].setTC(cpu->thread[tid]->getTC());
+            predecoders[tid].moreBytes(thisPC, fetchAddr, inst);

-            if (predecoder.needMoreBytes()) {
+            if (predecoders[tid].needMoreBytes()) {
                blkOffset++;
                fetchAddr += instSize;
                pcOffset += instSize;
@@ -1286,9 +1285,9 @@
        // the memory we've processed so far.
        do {
            if (!(curMacroop || inRom)) {
-                if (predecoder.extMachInstReady()) {
+                if (predecoders[tid].extMachInstReady()) {
                    ExtMachInst extMachInst =
-                        predecoder.getExtMachInst(thisPC);
+                        predecoders[tid].getExtMachInst(thisPC);
                    staticInst =
                        decoder.decode(extMachInst, thisPC.instAddr());

@@ -1360,7 +1359,7 @@
                status_change = true;
                break;
            }
-        } while ((curMacroop || predecoder.extMachInstReady()) &&
+        } while ((curMacroop || predecoders[tid].extMachInstReady()) &&
                 numInst < fetchWidth);
    }

The hello programs makes it further but it still aborts.

dependGraph[502]: No producer. consumer:
dependGraph[503]: No producer. consumer:
dependGraph[504]: No producer. consumer:
dependGraph[505]: No producer. consumer:
dependGraph[506]: No producer. consumer:
dependGraph[507]: No producer. consumer:
dependGraph[508]: No producer. consumer:
dependGraph[509]: No producer. consumer:
dependGraph[510]: No producer. consumer:
dependGraph[511]: No producer. consumer:
memAllocCounter: 26
panic: Dependency graph 55 not empty!
 @ cycle 435000
[addToProducers:build/X86_SE/cpu/o3/inst_queue_impl.hh, line 1312]


Mike





_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to