Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory protocol

Beckmann, Brad Tue, 08 Feb 2011 15:37:48 -0800

Hi Korey,

Just to clarify, the deadlock threshold in the sequencer is different than the 
deadlock threshold in the mem tester.  The sequencer's deadlock mechanism 
detects whether any particular request takes longer than the threshold.  
Meanwhile the mem tester deadlock threshold just ensures that a particular cpu 
sees at least one request complete within the deadlock threshold.  I don't 
think we want to degrade the deadlock checker to just a warning.  While in this 
particular case, the deadlock turned out to be just a performance issue, in my 
experience the vast majority of potential deadlock detections turn out to be 
real bugs.


Later today I'll check in patch that increases the ruby mem test deadlock 
threshold.

Brad


From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Korey Sewell
Sent: Monday, February 07, 2011 2:27 PM
To: M5 Developer List
Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory protocol

Another followup on this is that the "deadlock_threshold" parameter doesnt 
propagate to the MemTester CPU.

So when I'm testing 64 CPUS, the memtester.cc still has this code:
 "   if (!tickEvent.scheduled())
        schedule(tickEvent, curTick() + ticks(1));

    if (++noResponseCycles >= 500000) {
        if (issueDmas) {
            cerr << "DMA tester ";
        }
        cerr << name() << ": deadlocked at cycle " << curTick() << endl;
        fatal("");
    }
"

That hardcoded 500000 is not a great number (as people have said) because as 
your topologies/mem. hierarchies change, then the max # of cycles that you have 
to wait for a response can also change, right?

Increasing that # by hand is a arduous thing to do, so maybe that # should come 
off a parameter, as well as maybe we should "warn" there that a deadlock is 
possible after some type of inordinate wait time.

The fix should be just to warn about a long wait after an inordinate 
period...Something like this I think:
"
if (++noResponseCycles % 500000 == 0) {
      warn("cpu X has waited for %i cycles", noResponseCycles);
}
"

Lastly, should the memtester really send out a memory access on every tick? The 
actual injection rate could be much higher than the rate at which we resolve 
contention.

Maybe we should consider having X many outstanding requests per CPU as a more 
realistic measure that can stress the system but not make the noResponseCycles 
stat (?) grow to such an high number..
On Mon, Feb 7, 2011 at 1:27 PM, Beckmann, Brad 
<brad.beckm...@amd.com<mailto:brad.beckm...@amd.com>> wrote:
Yep, if I increase the deadlock threshold to 5 million cycles, the deadlock 
warning is not encountered.  However, I don't think that we should increase the 
default deadlock threshold to by an order-of-magnitude.  Instead, let's just 
increase the threashold for the mem tester.  How about I check in the following 
small patch.

Brad


diff --git a/configs/example/ruby_mem_test.py b/configs/example/ruby_mem_test.py
--- a/configs/example/ruby_mem_test.py
+++ b/configs/example/ruby_mem_test.py
@@ -135,6 +135,12 @@
    cpu.test = system.ruby.cpu_ruby_ports[i].port
    cpu.functional = system.funcmem.port

+    #
+    # Since the memtester is incredibly bursty, increase the deadlock
+    # threshold to 5 million cycles
+    #
+    system.ruby.cpu_ruby_ports[i].deadlock_threshold = 5000000
+
 for (i, dma) in enumerate(dmas):
    #
    # Tie the dma memtester ports to the correct functional port
diff --git a/tests/configs/memtest-ruby.py b/tests/configs/memtest-ruby.py
--- a/tests/configs/memtest-ruby.py
+++ b/tests/configs/memtest-ruby.py
@@ -96,6 +96,12 @@
     #
     cpus[i].test = ruby_port.port
     cpus[i].functional = system.funcmem.port
+
+     #
+     # Since the memtester is incredibly bursty, increase the deadlock
+     # threshold to 5 million cycles
+     #
+     ruby_port.deadlock_threshold = 5000000

 # -----------------------
 # run simulation



> -----Original Message-----
> From: m5-dev-boun...@m5sim.org<mailto:m5-dev-boun...@m5sim.org> 
> [mailto:m5-dev-boun...@m5sim.org<mailto:m5-dev-boun...@m5sim.org>]
> On Behalf Of Nilay Vaish
> Sent: Monday, February 07, 2011 9:12 AM
> To: M5 Developer List
> Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
> protocol
>
> Brad, I also see the protocol getting into a dead lock. I tried to get a 
> trace, but
> I get segmentation fault (yes, the segmentation fault only occurs when trace
> flag ProtocolTrace is supplied). It seems to me that memory is getting
> corrupted somewhere, because the fault occurs in malloc it self.
>
> It could be that protocol is actually not in a dead lock. Both Arka and I had
> increased the deadlock threashold while testing the protocol. I will try with
> increased threashold later in the day.
>
> One more thing, the Orion 2.0 code that was committed last night makes use
> of printf(). It did not compile cleanly for me. I had change it fatal() and 
> include
> the header file base/misc.hh.
>
> --
> Nilay
>
> On Mon, 7 Feb 2011, Beckmann, Brad wrote:
>
> > FYI...If my local regression tests are correct.  This patch does not
> > fix all the problems with the MESI_CMP_directory protocol.  One of the
> > patches I just checked in fixes a subtle bug in the ruby_mem_test.
> > Fixing this bug, exposes more deadlock problems in the
> > MESI_CMP_directory protocol.
> >
> > To reproduce the regression tester's sequencer deadlock error, set the
> > Randomization flag to false in the file
> > configs/example/ruby_mem_test.py then run the following command:
> >
> > build/ALPHA_SE_MESI_CMP_directory/m5.debug
> > configs/example/ruby_mem_test.py -n 8
> >
> > Let me know if you have any questions,
> >
> > Brad
> >
> >
> >> -----Original Message-----
> >> From: m5-dev-boun...@m5sim.org<mailto:m5-dev-boun...@m5sim.org> 
> >> [mailto:m5-dev-<mailto:m5-dev->
> boun...@m5sim.org<mailto:boun...@m5sim.org>] On
> >> Behalf Of Nilay Vaish
> >> Sent: Thursday, January 13, 2011 8:50 PM
> >> To: m5-dev@m5sim.org<mailto:m5-dev@m5sim.org>
> >> Subject: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
> >> protocol
> >>
> >> changeset 8f37a23e02d7 in /z/repo/m5
> >> details: http://repo.m5sim.org/m5?cmd=changeset;node=8f37a23e02d7
> >> description:
> >>    Ruby: Fixes MESI CMP directory protocol
> >>    The current implementation of MESI CMP directory protocol is
> broken.
> >>    This patch, from Arkaprava Basu, fixes the protocol.
> >>
> >> diffstat:
> >>
>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org<mailto:m5-dev@m5sim.org>
> http://m5sim.org/mailman/listinfo/m5-dev


_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org<mailto:m5-dev@m5sim.org>
http://m5sim.org/mailman/listinfo/m5-dev



--
- Korey

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory protocol

Reply via email to