Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory protocol

Arkaprava Basu Mon, 07 Feb 2011 14:15:35 -0800

Nilay,

If the same test completes with larger threshold then itcertainly a case of false positive and certainly NOT a deadlock (but maybe a case of starvation). If it were actually a deadlock, it would havejust reported deadlock after some more time of simulation.

On extending stall and wait to other protocols, you are absolutelycorrect. Many of the starvation issues (and thus perceived deadlock)show up due to "unfairness" in handling coherence request. After theprotocol trace segmentation issue is solved, I can getMESI_CMP_directory to use stall and wait.

I fully agree with Brad's argument about bumping up threshold fortesters. And having large threshold (i.e. 5 M) does not hurt much. Itwill take bit more time of simulation to report the deadlock, but ifthere is an actual deadlock it would anyway report it. So I would voteto stick with Brad's threshold number in the patch.


Thanks
Arka


On 02/07/2011 12:39 PM, Nilay Vaish wrote:

Brad,
I think 5,000,000 is a lot. IIRC, a million worked the last time Itested the protocol. We can check the patch in, though I am of theview that we should let it remain as is till we can generate theprotocol trace and make sure that this not an actual dead lock. I needto first detect the reason for the segmentation fault received onlywhen trace is being collected.
Another issue is that we need to extend the stall and wait to otherprotocols as well. This, I believe, may help in reducing such deadlockinstances. While working on MESI CMP, I saw many of the times earlierrequests remain un-fulfilled because of later requests for the sameaddress.
--
Nilay

On Mon, 7 Feb 2011, Beckmann, Brad wrote:
Yep, if I increase the deadlock threshold to 5 million cycles, thedeadlock warning is not encountered. However, I don't think that weshould increase the default deadlock threshold to by anorder-of-magnitude. Instead, let's just increase the threashold forthe mem tester. How about I check in the following small patch.
Brad
diff --git a/configs/example/ruby_mem_test.pyb/configs/example/ruby_mem_test.py
--- a/configs/example/ruby_mem_test.py
+++ b/configs/example/ruby_mem_test.py
@@ -135,6 +135,12 @@
    cpu.test = system.ruby.cpu_ruby_ports[i].port
    cpu.functional = system.funcmem.port

+    #
+    # Since the memtester is incredibly bursty, increase the deadlock
+    # threshold to 5 million cycles
+    #
+    system.ruby.cpu_ruby_ports[i].deadlock_threshold = 5000000
+
for (i, dma) in enumerate(dmas):
    #
    # Tie the dma memtester ports to the correct functional port
diff --git a/tests/configs/memtest-ruby.pyb/tests/configs/memtest-ruby.py
--- a/tests/configs/memtest-ruby.py
+++ b/tests/configs/memtest-ruby.py
@@ -96,6 +96,12 @@
     #
     cpus[i].test = ruby_port.port
     cpus[i].functional = system.funcmem.port
+
+     #
+     # Since the memtester is incredibly bursty, increase the deadlock
+     # threshold to 5 million cycles
+     #
+     ruby_port.deadlock_threshold = 5000000

# -----------------------
# run simulation
-----Original Message-----
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
On Behalf Of Nilay Vaish
Sent: Monday, February 07, 2011 9:12 AM
To: M5 Developer List
Subject: Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
protocol
Brad, I also see the protocol getting into a dead lock. I tried toget a trace, butI get segmentation fault (yes, the segmentation fault only occurswhen trace
flag ProtocolTrace is supplied). It seems to me that memory is getting
corrupted somewhere, because the fault occurs in malloc it self.
It could be that protocol is actually not in a dead lock. Both Arkaand I hadincreased the deadlock threashold while testing the protocol. I willtry with
increased threashold later in the day.
One more thing, the Orion 2.0 code that was committed last nightmakes useof printf(). It did not compile cleanly for me. I had change itfatal() and include
the header file base/misc.hh.

--
Nilay

On Mon, 7 Feb 2011, Beckmann, Brad wrote:
FYI...If my local regression tests are correct.  This patch does not
fix all the problems with the MESI_CMP_directory protocol.  One of the
patches I just checked in fixes a subtle bug in the ruby_mem_test.
Fixing this bug, exposes more deadlock problems in the
MESI_CMP_directory protocol.

To reproduce the regression tester's sequencer deadlock error, set the
Randomization flag to false in the file
configs/example/ruby_mem_test.py then run the following command:

build/ALPHA_SE_MESI_CMP_directory/m5.debug
configs/example/ruby_mem_test.py -n 8

Let me know if you have any questions,

Brad
-----Original Message-----
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-
boun...@m5sim.org] On
Behalf Of Nilay Vaish
Sent: Thursday, January 13, 2011 8:50 PM
To: m5-dev@m5sim.org
Subject: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory
protocol

changeset 8f37a23e02d7 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=8f37a23e02d7
description:
    Ruby: Fixes MESI CMP directory protocol
    The current implementation of MESI CMP directory protocol is
broken.
    This patch, from Arkaprava Basu, fixes the protocol.

diffstat:
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] changeset in m5: Ruby: Fixes MESI CMP directory protocol

Reply via email to