Another followup on this is that the "deadlock_threshold" parameter doesnt
propagate to the MemTester CPU.

So when I'm testing 64 CPUS, the memtester.cc still has this code:
 "   if (!tickEvent.scheduled())
        schedule(tickEvent, curTick() + ticks(1));

    if (++noResponseCycles >= 500000) {
        if (issueDmas) {
            cerr << "DMA tester ";
        }
        cerr << name() << ": deadlocked at cycle " << curTick() << endl;
        fatal("");
    }
"

That hardcoded 500000 is not a great number (as people have said) because as
your topologies/mem. hierarchies change, then the max # of cycles that you
have to wait for a response can also change, right?

Increasing that # by hand is a arduous thing to do, so maybe that # should
come off a parameter, as well as maybe we should "warn" there that a
deadlock is possible after some type of inordinate wait time.

The fix should be just to warn about a long wait after an inordinate
period...Something like this I think:
"
if (++noResponseCycles % 500000 == 0) {
      warn("cpu X has waited for %i cycles", noResponseCycles);
}
"

Lastly, should the memtester really send out a memory access on every tick?
The actual injection rate could be much higher than the rate at which we
resolve contention.

Maybe we should consider having X many outstanding requests per CPU as a
more realistic measure that can stress the system but not make the
noResponseCycles stat (?) grow to such an high number..

On Mon, Feb 7, 2011 at 1:27 PM, Beckmann, Brad <brad.beckm...@amd.com>wrote:

> Yep, if I increase the deadlock threshold to 5 million cycles, the deadlock
> warning is not encountered.  However, I don't think that we should increase
> the default deadlock threshold to by an order-of-magnitude.  Instead, let's
> just increase the threashold for the mem tester.  How about I check in the
> following small patch.
>
> Brad
>
>
> diff --git a/configs/example/ruby_mem_test.py
> b/configs/example/ruby_mem_test.py
> --- a/configs/example/ruby_mem_test.py
> +++ b/configs/example/ruby_mem_test.py
> @@ -135,6 +135,12 @@
>     cpu.test = system.ruby.cpu_ruby_ports[i].port
>     cpu.functional = system.funcmem.port
>
> +    #
> +    # Since the memtester is incredibly bursty, increase the deadlock
> +    # threshold to 5 million cycles
> +    #
> +    system.ruby.cpu_ruby_ports[i].deadlock_threshold = 5000000
> +
>  for (i, dma) in enumerate(dmas):
>     #
>     # Tie the dma memtester ports to the correct functional port
> diff --git a/tests/configs/memtest-ruby.py b/tests/configs/memtest-ruby.py
> --- a/tests/configs/memtest-ruby.py
> +++ b/tests/configs/memtest-ruby.py
> @@ -96,6 +96,12 @@
>      #
>      cpus[i].test = ruby_port.port
>      cpus[i].functional = system.funcmem.port
> +
> +     #
> +     # Since the memtester is incredibly bursty, increase the deadlock
> +     # threshold to 5 million cycles
> +     #
> +     ruby_port.deadlock_threshold = 5000000
>
>  # -----------------------
>  # run simulation
>
>
>
> > -----Original Message-----
> > From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
> > On Behalf Of Nilay Vaish
> > Sent: Monday, February 07, 2011 9:12 AM
> > To: M5 Developer List
> > Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
> > protocol
> >
> > Brad, I also see the protocol getting into a dead lock. I tried to get a
> trace, but
> > I get segmentation fault (yes, the segmentation fault only occurs when
> trace
> > flag ProtocolTrace is supplied). It seems to me that memory is getting
> > corrupted somewhere, because the fault occurs in malloc it self.
> >
> > It could be that protocol is actually not in a dead lock. Both Arka and I
> had
> > increased the deadlock threashold while testing the protocol. I will try
> with
> > increased threashold later in the day.
> >
> > One more thing, the Orion 2.0 code that was committed last night makes
> use
> > of printf(). It did not compile cleanly for me. I had change it fatal()
> and include
> > the header file base/misc.hh.
> >
> > --
> > Nilay
> >
> > On Mon, 7 Feb 2011, Beckmann, Brad wrote:
> >
> > > FYI...If my local regression tests are correct.  This patch does not
> > > fix all the problems with the MESI_CMP_directory protocol.  One of the
> > > patches I just checked in fixes a subtle bug in the ruby_mem_test.
> > > Fixing this bug, exposes more deadlock problems in the
> > > MESI_CMP_directory protocol.
> > >
> > > To reproduce the regression tester's sequencer deadlock error, set the
> > > Randomization flag to false in the file
> > > configs/example/ruby_mem_test.py then run the following command:
> > >
> > > build/ALPHA_SE_MESI_CMP_directory/m5.debug
> > > configs/example/ruby_mem_test.py -n 8
> > >
> > > Let me know if you have any questions,
> > >
> > > Brad
> > >
> > >
> > >> -----Original Message-----
> > >> From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
> > boun...@m5sim.org] On
> > >> Behalf Of Nilay Vaish
> > >> Sent: Thursday, January 13, 2011 8:50 PM
> > >> To: m5-dev@m5sim.org
> > >> Subject: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
> > >> protocol
> > >>
> > >> changeset 8f37a23e02d7 in /z/repo/m5
> > >> details: http://repo.m5sim.org/m5?cmd=changeset;node=8f37a23e02d7
> > >> description:
> > >>    Ruby: Fixes MESI CMP directory protocol
> > >>    The current implementation of MESI CMP directory protocol is
> > broken.
> > >>    This patch, from Arkaprava Basu, fixes the protocol.
> > >>
> > >> diffstat:
> > >>
> >
> > _______________________________________________
> > m5-dev mailing list
> > m5-dev@m5sim.org
> > http://m5sim.org/mailman/listinfo/m5-dev
>
>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>



-- 
- Korey
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to