Hi Korey, Just to clarify, the deadlock threshold in the sequencer is different than the deadlock threshold in the mem tester. The sequencer's deadlock mechanism detects whether any particular request takes longer than the threshold. Meanwhile the mem tester deadlock threshold just ensures that a particular cpu sees at least one request complete within the deadlock threshold. I don't think we want to degrade the deadlock checker to just a warning. While in this particular case, the deadlock turned out to be just a performance issue, in my experience the vast majority of potential deadlock detections turn out to be real bugs.
Later today I'll check in patch that increases the ruby mem test deadlock threshold. Brad From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Monday, February 07, 2011 2:27 PM To: M5 Developer List Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory protocol Another followup on this is that the "deadlock_threshold" parameter doesnt propagate to the MemTester CPU. So when I'm testing 64 CPUS, the memtester.cc still has this code: " if (!tickEvent.scheduled()) schedule(tickEvent, curTick() + ticks(1)); if (++noResponseCycles >= 500000) { if (issueDmas) { cerr << "DMA tester "; } cerr << name() << ": deadlocked at cycle " << curTick() << endl; fatal(""); } " That hardcoded 500000 is not a great number (as people have said) because as your topologies/mem. hierarchies change, then the max # of cycles that you have to wait for a response can also change, right? Increasing that # by hand is a arduous thing to do, so maybe that # should come off a parameter, as well as maybe we should "warn" there that a deadlock is possible after some type of inordinate wait time. The fix should be just to warn about a long wait after an inordinate period...Something like this I think: " if (++noResponseCycles % 500000 == 0) { warn("cpu X has waited for %i cycles", noResponseCycles); } " Lastly, should the memtester really send out a memory access on every tick? The actual injection rate could be much higher than the rate at which we resolve contention. Maybe we should consider having X many outstanding requests per CPU as a more realistic measure that can stress the system but not make the noResponseCycles stat (?) grow to such an high number.. On Mon, Feb 7, 2011 at 1:27 PM, Beckmann, Brad <brad.beckm...@amd.com<mailto:brad.beckm...@amd.com>> wrote: Yep, if I increase the deadlock threshold to 5 million cycles, the deadlock warning is not encountered. However, I don't think that we should increase the default deadlock threshold to by an order-of-magnitude. Instead, let's just increase the threashold for the mem tester. How about I check in the following small patch. Brad diff --git a/configs/example/ruby_mem_test.py b/configs/example/ruby_mem_test.py --- a/configs/example/ruby_mem_test.py +++ b/configs/example/ruby_mem_test.py @@ -135,6 +135,12 @@ cpu.test = system.ruby.cpu_ruby_ports[i].port cpu.functional = system.funcmem.port + # + # Since the memtester is incredibly bursty, increase the deadlock + # threshold to 5 million cycles + # + system.ruby.cpu_ruby_ports[i].deadlock_threshold = 5000000 + for (i, dma) in enumerate(dmas): # # Tie the dma memtester ports to the correct functional port diff --git a/tests/configs/memtest-ruby.py b/tests/configs/memtest-ruby.py --- a/tests/configs/memtest-ruby.py +++ b/tests/configs/memtest-ruby.py @@ -96,6 +96,12 @@ # cpus[i].test = ruby_port.port cpus[i].functional = system.funcmem.port + + # + # Since the memtester is incredibly bursty, increase the deadlock + # threshold to 5 million cycles + # + ruby_port.deadlock_threshold = 5000000 # ----------------------- # run simulation > -----Original Message----- > From: m5-dev-boun...@m5sim.org<mailto:m5-dev-boun...@m5sim.org> > [mailto:m5-dev-boun...@m5sim.org<mailto:m5-dev-boun...@m5sim.org>] > On Behalf Of Nilay Vaish > Sent: Monday, February 07, 2011 9:12 AM > To: M5 Developer List > Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory > protocol > > Brad, I also see the protocol getting into a dead lock. I tried to get a > trace, but > I get segmentation fault (yes, the segmentation fault only occurs when trace > flag ProtocolTrace is supplied). It seems to me that memory is getting > corrupted somewhere, because the fault occurs in malloc it self. > > It could be that protocol is actually not in a dead lock. Both Arka and I had > increased the deadlock threashold while testing the protocol. I will try with > increased threashold later in the day. > > One more thing, the Orion 2.0 code that was committed last night makes use > of printf(). It did not compile cleanly for me. I had change it fatal() and > include > the header file base/misc.hh. > > -- > Nilay > > On Mon, 7 Feb 2011, Beckmann, Brad wrote: > > > FYI...If my local regression tests are correct. This patch does not > > fix all the problems with the MESI_CMP_directory protocol. One of the > > patches I just checked in fixes a subtle bug in the ruby_mem_test. > > Fixing this bug, exposes more deadlock problems in the > > MESI_CMP_directory protocol. > > > > To reproduce the regression tester's sequencer deadlock error, set the > > Randomization flag to false in the file > > configs/example/ruby_mem_test.py then run the following command: > > > > build/ALPHA_SE_MESI_CMP_directory/m5.debug > > configs/example/ruby_mem_test.py -n 8 > > > > Let me know if you have any questions, > > > > Brad > > > > > >> -----Original Message----- > >> From: m5-dev-boun...@m5sim.org<mailto:m5-dev-boun...@m5sim.org> > >> [mailto:m5-dev-<mailto:m5-dev-> > boun...@m5sim.org<mailto:boun...@m5sim.org>] On > >> Behalf Of Nilay Vaish > >> Sent: Thursday, January 13, 2011 8:50 PM > >> To: m5-dev@m5sim.org<mailto:m5-dev@m5sim.org> > >> Subject: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory > >> protocol > >> > >> changeset 8f37a23e02d7 in /z/repo/m5 > >> details: http://repo.m5sim.org/m5?cmd=changeset;node=8f37a23e02d7 > >> description: > >> Ruby: Fixes MESI CMP directory protocol > >> The current implementation of MESI CMP directory protocol is > broken. > >> This patch, from Arkaprava Basu, fixes the protocol. > >> > >> diffstat: > >> > > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org<mailto:m5-dev@m5sim.org> > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list m5-dev@m5sim.org<mailto:m5-dev@m5sim.org> http://m5sim.org/mailman/listinfo/m5-dev -- - Korey
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev