FYI, I finally got around to reproducing this, and I think I see what the problem is. Unfortunately I don't see a really trivial fix, but I've got some ideas I'll work on to see if I can take care of it.
Steve On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt <[email protected]> wrote: > BTW, thanks for the detailed example... I've been traveling, but I'll see > if I can reproduce this when I get home. > > Steve > > On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected]>wrote: > >> Here is the process I went through on a fresh checkout of m5 this morning. >> >> (1) hg clone http://repo.m5sim.org/m5 >> >> (2) cd m5 >> >> (3) scons build/ALPHA_SE/m5.opt >> >> (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 >> --at-instruction >> >> (5) build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 >> --at-instruction -d --caches --l2cache >> M5 Simulator System >> >> Copyright (c) 2001-2008 >> The Regents of The University of Michigan >> All Rights Reserved >> >> >> M5 compiled Feb 17 2011 09:41:58 >> M5 revision 96bde0910197+ 8031+ default tip >> M5 started Feb 17 2011 09:54:32 >> M5 executing on rstrong-desktop >> command line: build/ALPHA_SE/m5.opt configs/example/se.py >> --checkpoint-restore=1 --at-instruction -d --caches --l2cache >> >> Global frequency set at 1000000000000 ticks per second >> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 >> Switch at curTick count:10000 >> info: Entering event queue @ 1000. Starting simulation... >> panic: Tried to access unmapped address 0x12008b488. >> @ cycle 2500 >> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] >> Memory Usage: 586300 KBytes >> For more information see: http://www.m5sim.org/panic/5932f339 >> Program aborted at cycle 2500 >> Aborted >> >> The problem seen in the output of (5) above is caused by the workload >> being adopted by switch_cpus as its parent as opposed to system.cpu. My >> original fix was to modify simulate.py to adopt orphans in sorted order, >> but this appears to create orphans for fuPool as shown in the snippet of >> config.ini below. This makes me think that something is broken in the design >> as it depends on the order in which objects come up if certain objects >> become orphans or if checkpoint files work. Is there any way to explicitly >> set the parent, child relationship if you want to avoid this non >> determinism. >> >> config.ini selected output: >> [system.switch_cpus.fuPool] >> type=FUPool >> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) >> (orphan) (orphan) >> >> >> >> >> >> >> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]>wrote: >> >>> Hi Rick, >>> >>> I'm a little confused by your statement "there is no recursion to add the >>> children of params". Being param value and being a child are separate >>> things, because an object A can be a param of many other objects but can >>> only be the child of one other object. The only relationship between the >>> two is that if A is set as a param value for a param of B and A does not >>> have a parent, then A will also implicitly be set as a child of B. (See >>> towards the end of SimObject.__setattr__().) >>> >>> So every SimObject param value *should* be the child of *some* SimObject, >>> so iterating over param values shouldn't be necessary. The whole point of >>> adoptOrphanParams() is to make sure this is true; it's the one place we >>> iterate over all the param values, just to make sure that they all have >>> parents (and to set them if they don't). >>> >>> Also, the adoptOrphanParams() method traverses the whole tree (see >>> simulate.py) using the descendants() call which is a pre-order traversal, so >>> any new children that are added at a particular node should be traversed >>> automatically. >>> >>> Your configuration should not be affected by whether you're restoring >>> from a checkpoint or not... the config gets built first, then if there's a >>> checkpoint it gets restored. >>> >>> I rewrote all this code last summer to clean it up, so I'm very >>> interested in figuring out where the bugs are. >>> >>> Steve >>> >>> >>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected]>wrote: >>> >>>> I took a close look at this problem because the same thing happens to >>>> me. It only occurs when I use the O3CPU model when resuming from a >>>> checkpoint. What I find is that config.ini has orphan for the FUList >>>> parameter of the O3CPU model. Further, none of the function units are >>>> adopted by fuPool. I think the problem lies in >>>> SimObject.py::add_child(self, >>>> name, child) and SimObject.py:: >>>> adoptOrphanParams(self). I think that there is no recursion to add the >>>> children of params. I tried a simple change at the end of add_child, that I >>>> adoptOrphanParams() of the child (change showed below). This allows the >>>> setup code to get further but now I die with: >>>> >>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was >>>> wondering if someone knows what is going wrong? Did a recent change forget >>>> to go down enough recursive levels when adopting children nodes? >>>> >>>> Best, >>>> -Rick >>>> >>>> def add_child(self, name, child): >>>> print "\t in add_child name=%s child=%s"%(name, child) >>>> child = coerceSimObjectOrVector(child) >>>> if child.get_parent(): >>>> raise RuntimeError, \ >>>> "add_child('%s'): child '%s' already has parent '%s'" >>>> % \ >>>> (name, child._name, child._parent) >>>> if self._children.has_key(name): >>>> # This code path had an undiscovered bug that would make it >>>> fail >>>> # at runtime. It had been here for a long time and was only >>>> # exposed by a buggy script. Changes here will probably not >>>> be >>>> # exercised without specialized testing. >>>> self.clear_child(name) >>>> child.set_parent(self, name) >>>> self._children[name] = child >>>> if isSimObjectVector(child): >>>> for obj in child: >>>> obj.adoptOrphanParams() >>>> elif isSimObjectOrVector(child): >>>> child.adoptOrphanParams() >>>> >>>>> >>>>> >>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Sheng, >>>>>> I've dug back through some of my simulations, and I haven't been >>>>>> able to find a case where I used 4GB of simulated memory, so I don't >>>>>> know if >>>>>> I have a baseline to show that the checkpoint restore works with that >>>>>> much >>>>>> memory. On the other hand, I have simulated with 512MB and 1GB of >>>>>> simulated >>>>>> memory, and it has worked fine. For full-system simulations, we often >>>>>> mount >>>>>> a swap disk in the simulated system in order to avoid the small virtual >>>>>> memory constraints imposed by the operating system. I'd have to defer to >>>>>> others on the list for knowledge about whether that would work with SE >>>>>> mode. >>>>>> I can attempt to address your other questions as well: >>>>>> 1) The way that you described the O3 parameters is how I have set >>>>>> them in the past, so that should work. >>>>>> 2) I've seen this problem before... It has had to do with the way >>>>>> that certain SimObjects are instantiated as children of other SimObjects >>>>>> at >>>>>> the beginning of the simulation, and with checkpoint restore, this isn't >>>>>> the >>>>>> cleanest process. When I ran into this problem, I was working on getting >>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to help me >>>>>> debug. He might be able to suggest first steps for figuring out what's >>>>>> wrong here. >>>>>> Hope this helps, >>>>>> Joel >>>>>> >>>>>> >>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]> wrote: >>>>>> >>>>>>> An two other questions: >>>>>>> >>>>>>> 1. What should I do to change the O3 parameters such as issueWidth, >>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs fine >>>>>>> if I >>>>>>> just run the benchmarks, but if I resume a checkpoint (created without >>>>>>> -d >>>>>>> option), then it will complain the CPU class has no such parameters. I >>>>>>> think >>>>>>> these parameters can only be set after M5 performs CPU mode switch, >>>>>>> then how >>>>>>> can I set these parameters so that M5 will use them after switching CPU >>>>>>> mode? >>>>>>> >>>>>>> if options.detailed: >>>>>>> CPUClass.commitWidth = 4 >>>>>>> CPUClass.decodeWidth = 4 >>>>>>> CPUClass.dispatchWidth = 4 >>>>>>> CPUClass.fetchWidth = 4 >>>>>>> CPUClass.issueWidth = 4 >>>>>>> CPUClass.commitWidth = 4 >>>>>>> CPUClass.renameWidth = 4 >>>>>>> CPUClass.squashWidth = 4 >>>>>>> CPUClass.wbWidth = 4 >>>>>>> CPUClass.numROBEntries = 128 >>>>>>> CPUClass.numIQEntries = 36 >>>>>>> CPUClass.LQEntries = 48 >>>>>>> >>>>>>> 2. When I resume a checkpoint with -d --caches options, I got >>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to figure >>>>>>> out >>>>>>> what the orphan node is. What should I do to find the orphan node? I >>>>>>> tried >>>>>>> "print self.name" in File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 822, in getCCObject, but got nothing. >>>>>>> >>>>>>> >>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench >>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache >>>>>>> 2200 >>>>>>> m5out/cpt.bzip2.2200 >>>>>>> >>>>>>> Global frequency set at 1000000000000 ticks per second >>>>>>> Traceback (most recent call last): >>>>>>> File "<string>", line 1, in ? >>>>>>> File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", >>>>>>> line 359, in main >>>>>>> exec filecode in scope >>>>>>> File "configs/example/se.py", line 179, in ? >>>>>>> Simulation.run(options, root, system, FutureClass) >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", >>>>>>> line 236, in run >>>>>>> m5.instantiate(checkpoint_dir) >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", >>>>>>> line 77, in instantiate >>>>>>> for obj in root.descendants(): obj.createCCObject() >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 841, in createCCObject >>>>>>> def createCCObject(self): >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 796, in getCCParams >>>>>>> value = value.getValue() >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 845, in getValue >>>>>>> def getValue(self): >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 826, in getCCObject >>>>>>> self._ccObject = -1 >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 796, in getCCParams >>>>>>> value = value.getValue() >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line 183, >>>>>>> in getValue >>>>>>> return [ v.getValue() for v in self ] >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 845, in getValue >>>>>>> def getValue(self): >>>>>>> File "/afs/ >>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>> 822, in getCCObject >>>>>>> #print self.name >>>>>>> RuntimeError: Attempt to instantiate orphan node >>>>>>> >>>>>>> Thanks a lot! >>>>>>> -Sheng >>>>>>> >>>>>> >>> >> >
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
