I still haven't been able to fix the issues with checkpointing in atomic and restoring to either atomic or timing, so I'm just poking my head out here to see if anyone else has tackled the problem.
-Griffin Wright On 4/10/2011 8:05 PM, Steve Reinhardt wrote:
Hi Leonard, I do have a patch for this issue, but haven't gotten around to pushing it yet: http://reviews.m5sim.org/r/608/ If you have the chance to download it from reviewboard and verify that it solves your problem that would be helpful. Steve On Thu, Apr 7, 2011 at 4:53 PM, Sage<[email protected]> wrote:Hi, Steve and Rick, You two discussed the problem of resuming a checkpoint when --l2cache and --detailed are both specified. But I noticed that the problem occurs when --l2cache and (--timing or --detailed) are specified but it won't happen if only the "--l2cache" option is there. Have you figured out a way of solving the problem? Thanks, Leonard On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt<[email protected]> wrote:FYI, I finally got around to reproducing this, and I think I see what the problem is. Unfortunately I don't see a really trivial fix, but I've got some ideas I'll work on to see if I can take care of it. Steve On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt<[email protected]> wrote:BTW, thanks for the detailed example... I've been traveling, but I'll see if I can reproduce this when I get home. Steve On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong<[email protected]> wrote:Here is the process I went through on a fresh checkout of m5 this morning. (1) hg clone http://repo.m5sim.org/m5 (2) cd m5 (3) scons build/ALPHA_SE/m5.opt (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 --at-instruction (5) build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 --at-instruction -d --caches --l2cache M5 Simulator System Copyright (c) 2001-2008 The Regents of The University of Michigan All Rights Reserved M5 compiled Feb 17 2011 09:41:58 M5 revision 96bde0910197+ 8031+ default tip M5 started Feb 17 2011 09:54:32 M5 executing on rstrong-desktop command line: build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 --at-instruction -d --caches --l2cache Global frequency set at 1000000000000 ticks per second 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 Switch at curTick count:10000 info: Entering event queue @ 1000. Starting simulation... panic: Tried to access unmapped address 0x12008b488. @ cycle 2500 [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] Memory Usage: 586300 KBytes For more information see: http://www.m5sim.org/panic/5932f339 Program aborted at cycle 2500 Aborted The problem seen in the output of (5) above is caused by the workload being adopted by switch_cpus as its parent as opposed to system.cpu. My original fix was to modify simulate.py to adopt orphans in sorted order, but this appears to create orphans for fuPool as shown in the snippet of config.ini below. This makes me think that something is broken in the design as it depends on the order in which objects come up if certain objects become orphans or if checkpoint files work. Is there any way to explicitly set the parent, child relationship if you want to avoid this non determinism. config.ini selected output: [system.switch_cpus.fuPool] type=FUPool FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt<[email protected]> wrote:Hi Rick, I'm a little confused by your statement "there is no recursion to add the children of params". Being param value and being a child are separate things, because an object A can be a param of many other objects but can only be the child of one other object. The only relationship between the two is that if A is set as a param value for a param of B and A does not have a parent, then A will also implicitly be set as a child of B. (See towards the end of SimObject.__setattr__().) So every SimObject param value *should* be the child of *some* SimObject, so iterating over param values shouldn't be necessary. The whole point of adoptOrphanParams() is to make sure this is true; it's the one place we iterate over all the param values, just to make sure that they all have parents (and to set them if they don't). Also, the adoptOrphanParams() method traverses the whole tree (see simulate.py) using the descendants() call which is a pre-order traversal, so any new children that are added at a particular node should be traversed automatically. Your configuration should not be affected by whether you're restoring from a checkpoint or not... the config gets built first, then if there's a checkpoint it gets restored. I rewrote all this code last summer to clean it up, so I'm very interested in figuring out where the bugs are. Steve On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong<[email protected]> wrote:I took a close look at this problem because the same thing happens to me. It only occurs when I use the O3CPU model when resuming from a checkpoint. What I find is that config.ini has orphan for the FUList parameter of the O3CPU model. Further, none of the function units are adopted by fuPool. I think the problem lies in SimObject.py::add_child(self, name, child) and SimObject.py:: adoptOrphanParams(self). I think that there is no recursion to add the children of params. I tried a simple change at the end of add_child, that I adoptOrphanParams() of the child (change showed below). This allows the setup code to get further but now I die with: "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was wondering if someone knows what is going wrong? Did a recent change forget to go down enough recursive levels when adopting children nodes? Best, -Rick def add_child(self, name, child): print "\t in add_child name=%s child=%s"%(name, child) child = coerceSimObjectOrVector(child) if child.get_parent(): raise RuntimeError, \ "add_child('%s'): child '%s' already has parent '%s'" % \ (name, child._name, child._parent) if self._children.has_key(name): # This code path had an undiscovered bug that would make it fail # at runtime. It had been here for a long time and was only # exposed by a buggy script. Changes here will probably not be # exercised without specialized testing. self.clear_child(name) child.set_parent(self, name) self._children[name] = child if isSimObjectVector(child): for obj in child: obj.adoptOrphanParams() elif isSimObjectOrVector(child): child.adoptOrphanParams()On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness <[email protected]> wrote:Hi Sheng, I've dug back through some of my simulations, and I haven't been able to find a case where I used 4GB of simulated memory, so I don't know if I have a baseline to show that the checkpoint restore works with that much memory. On the other hand, I have simulated with 512MB and 1GB of simulated memory, and it has worked fine. For full-system simulations, we often mount a swap disk in the simulated system in order to avoid the small virtual memory constraints imposed by the operating system. I'd have to defer to others on the list for knowledge about whether that would work with SE mode. I can attempt to address your other questions as well: 1) The way that you described the O3 parameters is how I have set them in the past, so that should work. 2) I've seen this problem before... It has had to do with the way that certain SimObjects are instantiated as children of other SimObjects at the beginning of the simulation, and with checkpoint restore, this isn't the cleanest process. When I ran into this problem, I was working on getting x86 timing mode working with Ruby, and Brad Beckmann was able to help me debug. He might be able to suggest first steps for figuring out what's wrong here. Hope this helps, Joel On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li<[email protected]> wrote:An two other questions: 1. What should I do to change the O3 parameters such as issueWidth, commitWidth, etc? I added a few lines in se.py as below. It runs fine if I just run the benchmarks, but if I resume a checkpoint (created without -d option), then it will complain the CPU class has no such parameters. I think these parameters can only be set after M5 performs CPU mode switch, then how can I set these parameters so that M5 will use them after switching CPU mode? if options.detailed: CPUClass.commitWidth = 4 CPUClass.decodeWidth = 4 CPUClass.dispatchWidth = 4 CPUClass.fetchWidth = 4 CPUClass.issueWidth = 4 CPUClass.commitWidth = 4 CPUClass.renameWidth = 4 CPUClass.squashWidth = 4 CPUClass.wbWidth = 4 CPUClass.numROBEntries = 128 CPUClass.numIQEntries = 36 CPUClass.LQEntries = 48 2. When I resume a checkpoint with -d --caches options, I got RuntimeError: Attempt to instantiate orphan node. I am trying to figure out what the orphan node is. What should I do to find the orphan node? I tried "print self.name" in File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 822, in getCCObject, but got nothing. command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache 2200 m5out/cpt.bzip2.2200 Global frequency set at 1000000000000 ticks per second Traceback (most recent call last): File "<string>", line 1, in ? File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", line 359, in main exec filecode in scope File "configs/example/se.py", line 179, in ? Simulation.run(options, root, system, FutureClass) File "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", line 236, in run m5.instantiate(checkpoint_dir) File "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", line 77, in instantiate for obj in root.descendants(): obj.createCCObject() File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 841, in createCCObject def createCCObject(self): File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 796, in getCCParams value = value.getValue() File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 845, in getValue def getValue(self): File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 826, in getCCObject self._ccObject = -1 File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 796, in getCCParams value = value.getValue() File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line 183, in getValue return [ v.getValue() for v in self ] File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 845, in getValue def getValue(self): File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line 822, in getCCObject #print self.name RuntimeError: Attempt to instantiate orphan node Thanks a lot! -Sheng_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users-- Give our ability to our work, but our genius to our life! _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
