BTW, thanks for the detailed example... I've been traveling, but I'll see if I can reproduce this when I get home.
Steve On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected]>wrote: > Here is the process I went through on a fresh checkout of m5 this morning. > > (1) hg clone http://repo.m5sim.org/m5 > > (2) cd m5 > > (3) scons build/ALPHA_SE/m5.opt > > (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 > --at-instruction > > (5) build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 > --at-instruction -d --caches --l2cache > M5 Simulator System > > Copyright (c) 2001-2008 > The Regents of The University of Michigan > All Rights Reserved > > > M5 compiled Feb 17 2011 09:41:58 > M5 revision 96bde0910197+ 8031+ default tip > M5 started Feb 17 2011 09:54:32 > M5 executing on rstrong-desktop > command line: build/ALPHA_SE/m5.opt configs/example/se.py > --checkpoint-restore=1 --at-instruction -d --caches --l2cache > > Global frequency set at 1000000000000 ticks per second > 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 > Switch at curTick count:10000 > info: Entering event queue @ 1000. Starting simulation... > panic: Tried to access unmapped address 0x12008b488. > @ cycle 2500 > [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] > Memory Usage: 586300 KBytes > For more information see: http://www.m5sim.org/panic/5932f339 > Program aborted at cycle 2500 > Aborted > > The problem seen in the output of (5) above is caused by the workload being > adopted by switch_cpus as its parent as opposed to system.cpu. My original > fix was to modify simulate.py to adopt orphans in sorted order, but this > appears to create orphans for fuPool as shown in the snippet of config.ini > below. This makes me think that something is broken in the design as it > depends on the order in which objects come up if certain objects become > orphans or if checkpoint files work. Is there any way to explicitly set the > parent, child relationship if you want to avoid this non determinism. > > config.ini selected output: > [system.switch_cpus.fuPool] > type=FUPool > FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) > (orphan) (orphan) > > > > > > > On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]> wrote: > >> Hi Rick, >> >> I'm a little confused by your statement "there is no recursion to add the >> children of params". Being param value and being a child are separate >> things, because an object A can be a param of many other objects but can >> only be the child of one other object. The only relationship between the >> two is that if A is set as a param value for a param of B and A does not >> have a parent, then A will also implicitly be set as a child of B. (See >> towards the end of SimObject.__setattr__().) >> >> So every SimObject param value *should* be the child of *some* SimObject, >> so iterating over param values shouldn't be necessary. The whole point of >> adoptOrphanParams() is to make sure this is true; it's the one place we >> iterate over all the param values, just to make sure that they all have >> parents (and to set them if they don't). >> >> Also, the adoptOrphanParams() method traverses the whole tree (see >> simulate.py) using the descendants() call which is a pre-order traversal, so >> any new children that are added at a particular node should be traversed >> automatically. >> >> Your configuration should not be affected by whether you're restoring from >> a checkpoint or not... the config gets built first, then if there's a >> checkpoint it gets restored. >> >> I rewrote all this code last summer to clean it up, so I'm very interested >> in figuring out where the bugs are. >> >> Steve >> >> >> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected]>wrote: >> >>> I took a close look at this problem because the same thing happens to me. >>> It only occurs when I use the O3CPU model when resuming from a checkpoint. >>> What I find is that config.ini has orphan for the FUList parameter of the >>> O3CPU model. Further, none of the function units are adopted by fuPool. I >>> think the problem lies in SimObject.py::add_child(self, name, child) and >>> SimObject.py:: >>> adoptOrphanParams(self). I think that there is no recursion to add the >>> children of params. I tried a simple change at the end of add_child, that I >>> adoptOrphanParams() of the child (change showed below). This allows the >>> setup code to get further but now I die with: >>> >>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was >>> wondering if someone knows what is going wrong? Did a recent change forget >>> to go down enough recursive levels when adopting children nodes? >>> >>> Best, >>> -Rick >>> >>> def add_child(self, name, child): >>> print "\t in add_child name=%s child=%s"%(name, child) >>> child = coerceSimObjectOrVector(child) >>> if child.get_parent(): >>> raise RuntimeError, \ >>> "add_child('%s'): child '%s' already has parent '%s'" % >>> \ >>> (name, child._name, child._parent) >>> if self._children.has_key(name): >>> # This code path had an undiscovered bug that would make it >>> fail >>> # at runtime. It had been here for a long time and was only >>> # exposed by a buggy script. Changes here will probably not >>> be >>> # exercised without specialized testing. >>> self.clear_child(name) >>> child.set_parent(self, name) >>> self._children[name] = child >>> if isSimObjectVector(child): >>> for obj in child: >>> obj.adoptOrphanParams() >>> elif isSimObjectOrVector(child): >>> child.adoptOrphanParams() >>> >>>> >>>> >>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness <[email protected] >>>> > wrote: >>>> >>>>> Hi Sheng, >>>>> I've dug back through some of my simulations, and I haven't been able >>>>> to find a case where I used 4GB of simulated memory, so I don't know if I >>>>> have a baseline to show that the checkpoint restore works with that much >>>>> memory. On the other hand, I have simulated with 512MB and 1GB of >>>>> simulated >>>>> memory, and it has worked fine. For full-system simulations, we often >>>>> mount >>>>> a swap disk in the simulated system in order to avoid the small virtual >>>>> memory constraints imposed by the operating system. I'd have to defer to >>>>> others on the list for knowledge about whether that would work with SE >>>>> mode. >>>>> I can attempt to address your other questions as well: >>>>> 1) The way that you described the O3 parameters is how I have set >>>>> them in the past, so that should work. >>>>> 2) I've seen this problem before... It has had to do with the way >>>>> that certain SimObjects are instantiated as children of other SimObjects >>>>> at >>>>> the beginning of the simulation, and with checkpoint restore, this isn't >>>>> the >>>>> cleanest process. When I ran into this problem, I was working on getting >>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to help me >>>>> debug. He might be able to suggest first steps for figuring out what's >>>>> wrong here. >>>>> Hope this helps, >>>>> Joel >>>>> >>>>> >>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]> wrote: >>>>> >>>>>> An two other questions: >>>>>> >>>>>> 1. What should I do to change the O3 parameters such as issueWidth, >>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs fine if >>>>>> I >>>>>> just run the benchmarks, but if I resume a checkpoint (created without -d >>>>>> option), then it will complain the CPU class has no such parameters. I >>>>>> think >>>>>> these parameters can only be set after M5 performs CPU mode switch, then >>>>>> how >>>>>> can I set these parameters so that M5 will use them after switching CPU >>>>>> mode? >>>>>> >>>>>> if options.detailed: >>>>>> CPUClass.commitWidth = 4 >>>>>> CPUClass.decodeWidth = 4 >>>>>> CPUClass.dispatchWidth = 4 >>>>>> CPUClass.fetchWidth = 4 >>>>>> CPUClass.issueWidth = 4 >>>>>> CPUClass.commitWidth = 4 >>>>>> CPUClass.renameWidth = 4 >>>>>> CPUClass.squashWidth = 4 >>>>>> CPUClass.wbWidth = 4 >>>>>> CPUClass.numROBEntries = 128 >>>>>> CPUClass.numIQEntries = 36 >>>>>> CPUClass.LQEntries = 48 >>>>>> >>>>>> 2. When I resume a checkpoint with -d --caches options, I got >>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to figure >>>>>> out >>>>>> what the orphan node is. What should I do to find the orphan node? I >>>>>> tried >>>>>> "print self.name" in File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 822, in getCCObject, but got nothing. >>>>>> >>>>>> >>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench >>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache >>>>>> 2200 >>>>>> m5out/cpt.bzip2.2200 >>>>>> >>>>>> Global frequency set at 1000000000000 ticks per second >>>>>> Traceback (most recent call last): >>>>>> File "<string>", line 1, in ? >>>>>> File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", >>>>>> line 359, in main >>>>>> exec filecode in scope >>>>>> File "configs/example/se.py", line 179, in ? >>>>>> Simulation.run(options, root, system, FutureClass) >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", >>>>>> line 236, in run >>>>>> m5.instantiate(checkpoint_dir) >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", >>>>>> line 77, in instantiate >>>>>> for obj in root.descendants(): obj.createCCObject() >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 841, in createCCObject >>>>>> def createCCObject(self): >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 796, in getCCParams >>>>>> value = value.getValue() >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 845, in getValue >>>>>> def getValue(self): >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 826, in getCCObject >>>>>> self._ccObject = -1 >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 796, in getCCParams >>>>>> value = value.getValue() >>>>>> File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", >>>>>> line 183, in getValue >>>>>> return [ v.getValue() for v in self ] >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 845, in getValue >>>>>> def getValue(self): >>>>>> File "/afs/ >>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>> 822, in getCCObject >>>>>> #print self.name >>>>>> RuntimeError: Attempt to instantiate orphan node >>>>>> >>>>>> Thanks a lot! >>>>>> -Sheng >>>>>> >>>>> >> >
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
