Another followup on this is that the "deadlock_threshold" parameter doesnt propagate to the MemTester CPU.
So when I'm testing 64 CPUS, the memtester.cc still has this code: " if (!tickEvent.scheduled()) schedule(tickEvent, curTick() + ticks(1)); if (++noResponseCycles >= 500000) { if (issueDmas) { cerr << "DMA tester "; } cerr << name() << ": deadlocked at cycle " << curTick() << endl; fatal(""); } " That hardcoded 500000 is not a great number (as people have said) because as your topologies/mem. hierarchies change, then the max # of cycles that you have to wait for a response can also change, right? Increasing that # by hand is a arduous thing to do, so maybe that # should come off a parameter, as well as maybe we should "warn" there that a deadlock is possible after some type of inordinate wait time. The fix should be just to warn about a long wait after an inordinate period...Something like this I think: " if (++noResponseCycles % 500000 == 0) { warn("cpu X has waited for %i cycles", noResponseCycles); } " Lastly, should the memtester really send out a memory access on every tick? The actual injection rate could be much higher than the rate at which we resolve contention. Maybe we should consider having X many outstanding requests per CPU as a more realistic measure that can stress the system but not make the noResponseCycles stat (?) grow to such an high number.. On Mon, Feb 7, 2011 at 1:27 PM, Beckmann, Brad <brad.beckm...@amd.com>wrote: > Yep, if I increase the deadlock threshold to 5 million cycles, the deadlock > warning is not encountered. However, I don't think that we should increase > the default deadlock threshold to by an order-of-magnitude. Instead, let's > just increase the threashold for the mem tester. How about I check in the > following small patch. > > Brad > > > diff --git a/configs/example/ruby_mem_test.py > b/configs/example/ruby_mem_test.py > --- a/configs/example/ruby_mem_test.py > +++ b/configs/example/ruby_mem_test.py > @@ -135,6 +135,12 @@ > cpu.test = system.ruby.cpu_ruby_ports[i].port > cpu.functional = system.funcmem.port > > + # > + # Since the memtester is incredibly bursty, increase the deadlock > + # threshold to 5 million cycles > + # > + system.ruby.cpu_ruby_ports[i].deadlock_threshold = 5000000 > + > for (i, dma) in enumerate(dmas): > # > # Tie the dma memtester ports to the correct functional port > diff --git a/tests/configs/memtest-ruby.py b/tests/configs/memtest-ruby.py > --- a/tests/configs/memtest-ruby.py > +++ b/tests/configs/memtest-ruby.py > @@ -96,6 +96,12 @@ > # > cpus[i].test = ruby_port.port > cpus[i].functional = system.funcmem.port > + > + # > + # Since the memtester is incredibly bursty, increase the deadlock > + # threshold to 5 million cycles > + # > + ruby_port.deadlock_threshold = 5000000 > > # ----------------------- > # run simulation > > > > > -----Original Message----- > > From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] > > On Behalf Of Nilay Vaish > > Sent: Monday, February 07, 2011 9:12 AM > > To: M5 Developer List > > Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory > > protocol > > > > Brad, I also see the protocol getting into a dead lock. I tried to get a > trace, but > > I get segmentation fault (yes, the segmentation fault only occurs when > trace > > flag ProtocolTrace is supplied). It seems to me that memory is getting > > corrupted somewhere, because the fault occurs in malloc it self. > > > > It could be that protocol is actually not in a dead lock. Both Arka and I > had > > increased the deadlock threashold while testing the protocol. I will try > with > > increased threashold later in the day. > > > > One more thing, the Orion 2.0 code that was committed last night makes > use > > of printf(). It did not compile cleanly for me. I had change it fatal() > and include > > the header file base/misc.hh. > > > > -- > > Nilay > > > > On Mon, 7 Feb 2011, Beckmann, Brad wrote: > > > > > FYI...If my local regression tests are correct. This patch does not > > > fix all the problems with the MESI_CMP_directory protocol. One of the > > > patches I just checked in fixes a subtle bug in the ruby_mem_test. > > > Fixing this bug, exposes more deadlock problems in the > > > MESI_CMP_directory protocol. > > > > > > To reproduce the regression tester's sequencer deadlock error, set the > > > Randomization flag to false in the file > > > configs/example/ruby_mem_test.py then run the following command: > > > > > > build/ALPHA_SE_MESI_CMP_directory/m5.debug > > > configs/example/ruby_mem_test.py -n 8 > > > > > > Let me know if you have any questions, > > > > > > Brad > > > > > > > > >> -----Original Message----- > > >> From: m5-dev-boun...@m5sim.org [mailto:m5-dev- > > boun...@m5sim.org] On > > >> Behalf Of Nilay Vaish > > >> Sent: Thursday, January 13, 2011 8:50 PM > > >> To: m5-dev@m5sim.org > > >> Subject: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory > > >> protocol > > >> > > >> changeset 8f37a23e02d7 in /z/repo/m5 > > >> details: http://repo.m5sim.org/m5?cmd=changeset;node=8f37a23e02d7 > > >> description: > > >> Ruby: Fixes MESI CMP directory protocol > > >> The current implementation of MESI CMP directory protocol is > > broken. > > >> This patch, from Arkaprava Basu, fixes the protocol. > > >> > > >> diffstat: > > >> > > > > _______________________________________________ > > m5-dev mailing list > > m5-dev@m5sim.org > > http://m5sim.org/mailman/listinfo/m5-dev > > > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev > -- - Korey
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev