Hi, Steve and Rick, You two discussed the problem of resuming a checkpoint when --l2cache and --detailed are both specified. But I noticed that the problem occurs when --l2cache and (--timing or --detailed) are specified but it won't happen if only the "--l2cache" option is there. Have you figured out a way of solving the problem?
Thanks, Leonard On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt <[email protected]> wrote: > FYI, I finally got around to reproducing this, and I think I see what the > problem is. Unfortunately I don't see a really trivial fix, but I've got > some ideas I'll work on to see if I can take care of it. > > Steve > > > On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt <[email protected]> wrote: > >> BTW, thanks for the detailed example... I've been traveling, but I'll see >> if I can reproduce this when I get home. >> >> Steve >> >> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected]>wrote: >> >>> Here is the process I went through on a fresh checkout of m5 this >>> morning. >>> >>> (1) hg clone http://repo.m5sim.org/m5 >>> >>> (2) cd m5 >>> >>> (3) scons build/ALPHA_SE/m5.opt >>> >>> (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 >>> --at-instruction >>> >>> (5) build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 >>> --at-instruction -d --caches --l2cache >>> M5 Simulator System >>> >>> Copyright (c) 2001-2008 >>> The Regents of The University of Michigan >>> All Rights Reserved >>> >>> >>> M5 compiled Feb 17 2011 09:41:58 >>> M5 revision 96bde0910197+ 8031+ default tip >>> M5 started Feb 17 2011 09:54:32 >>> M5 executing on rstrong-desktop >>> command line: build/ALPHA_SE/m5.opt configs/example/se.py >>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache >>> >>> Global frequency set at 1000000000000 ticks per second >>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 >>> Switch at curTick count:10000 >>> info: Entering event queue @ 1000. Starting simulation... >>> panic: Tried to access unmapped address 0x12008b488. >>> @ cycle 2500 >>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] >>> Memory Usage: 586300 KBytes >>> For more information see: http://www.m5sim.org/panic/5932f339 >>> Program aborted at cycle 2500 >>> Aborted >>> >>> The problem seen in the output of (5) above is caused by the workload >>> being adopted by switch_cpus as its parent as opposed to system.cpu. My >>> original fix was to modify simulate.py to adopt orphans in sorted order, >>> but this appears to create orphans for fuPool as shown in the snippet of >>> config.ini below. This makes me think that something is broken in the design >>> as it depends on the order in which objects come up if certain objects >>> become orphans or if checkpoint files work. Is there any way to explicitly >>> set the parent, child relationship if you want to avoid this non >>> determinism. >>> >>> config.ini selected output: >>> [system.switch_cpus.fuPool] >>> type=FUPool >>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) >>> (orphan) (orphan) >>> >>> >>> >>> >>> >>> >>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]>wrote: >>> >>>> Hi Rick, >>>> >>>> I'm a little confused by your statement "there is no recursion to add >>>> the children of params". Being param value and being a child are separate >>>> things, because an object A can be a param of many other objects but can >>>> only be the child of one other object. The only relationship between the >>>> two is that if A is set as a param value for a param of B and A does not >>>> have a parent, then A will also implicitly be set as a child of B. (See >>>> towards the end of SimObject.__setattr__().) >>>> >>>> So every SimObject param value *should* be the child of *some* >>>> SimObject, so iterating over param values shouldn't be necessary. The >>>> whole >>>> point of adoptOrphanParams() is to make sure this is true; it's the one >>>> place we iterate over all the param values, just to make sure that they all >>>> have parents (and to set them if they don't). >>>> >>>> Also, the adoptOrphanParams() method traverses the whole tree (see >>>> simulate.py) using the descendants() call which is a pre-order traversal, >>>> so >>>> any new children that are added at a particular node should be traversed >>>> automatically. >>>> >>>> Your configuration should not be affected by whether you're restoring >>>> from a checkpoint or not... the config gets built first, then if there's a >>>> checkpoint it gets restored. >>>> >>>> I rewrote all this code last summer to clean it up, so I'm very >>>> interested in figuring out where the bugs are. >>>> >>>> Steve >>>> >>>> >>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected]>wrote: >>>> >>>>> I took a close look at this problem because the same thing happens to >>>>> me. It only occurs when I use the O3CPU model when resuming from a >>>>> checkpoint. What I find is that config.ini has orphan for the FUList >>>>> parameter of the O3CPU model. Further, none of the function units are >>>>> adopted by fuPool. I think the problem lies in >>>>> SimObject.py::add_child(self, >>>>> name, child) and SimObject.py:: >>>>> adoptOrphanParams(self). I think that there is no recursion to add the >>>>> children of params. I tried a simple change at the end of add_child, that >>>>> I >>>>> adoptOrphanParams() of the child (change showed below). This allows the >>>>> setup code to get further but now I die with: >>>>> >>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was >>>>> wondering if someone knows what is going wrong? Did a recent change forget >>>>> to go down enough recursive levels when adopting children nodes? >>>>> >>>>> Best, >>>>> -Rick >>>>> >>>>> def add_child(self, name, child): >>>>> print "\t in add_child name=%s child=%s"%(name, child) >>>>> child = coerceSimObjectOrVector(child) >>>>> if child.get_parent(): >>>>> raise RuntimeError, \ >>>>> "add_child('%s'): child '%s' already has parent '%s'" >>>>> % \ >>>>> (name, child._name, child._parent) >>>>> if self._children.has_key(name): >>>>> # This code path had an undiscovered bug that would make it >>>>> fail >>>>> # at runtime. It had been here for a long time and was only >>>>> # exposed by a buggy script. Changes here will probably not >>>>> be >>>>> # exercised without specialized testing. >>>>> self.clear_child(name) >>>>> child.set_parent(self, name) >>>>> self._children[name] = child >>>>> if isSimObjectVector(child): >>>>> for obj in child: >>>>> obj.adoptOrphanParams() >>>>> elif isSimObjectOrVector(child): >>>>> child.adoptOrphanParams() >>>>> >>>>>> >>>>>> >>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Sheng, >>>>>>> I've dug back through some of my simulations, and I haven't been >>>>>>> able to find a case where I used 4GB of simulated memory, so I don't >>>>>>> know if >>>>>>> I have a baseline to show that the checkpoint restore works with that >>>>>>> much >>>>>>> memory. On the other hand, I have simulated with 512MB and 1GB of >>>>>>> simulated >>>>>>> memory, and it has worked fine. For full-system simulations, we often >>>>>>> mount >>>>>>> a swap disk in the simulated system in order to avoid the small virtual >>>>>>> memory constraints imposed by the operating system. I'd have to defer >>>>>>> to >>>>>>> others on the list for knowledge about whether that would work with SE >>>>>>> mode. >>>>>>> I can attempt to address your other questions as well: >>>>>>> 1) The way that you described the O3 parameters is how I have set >>>>>>> them in the past, so that should work. >>>>>>> 2) I've seen this problem before... It has had to do with the way >>>>>>> that certain SimObjects are instantiated as children of other >>>>>>> SimObjects at >>>>>>> the beginning of the simulation, and with checkpoint restore, this >>>>>>> isn't the >>>>>>> cleanest process. When I ran into this problem, I was working on >>>>>>> getting >>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to help me >>>>>>> debug. He might be able to suggest first steps for figuring out what's >>>>>>> wrong here. >>>>>>> Hope this helps, >>>>>>> Joel >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]>wrote: >>>>>>> >>>>>>>> An two other questions: >>>>>>>> >>>>>>>> 1. What should I do to change the O3 parameters such as issueWidth, >>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs fine >>>>>>>> if I >>>>>>>> just run the benchmarks, but if I resume a checkpoint (created without >>>>>>>> -d >>>>>>>> option), then it will complain the CPU class has no such parameters. I >>>>>>>> think >>>>>>>> these parameters can only be set after M5 performs CPU mode switch, >>>>>>>> then how >>>>>>>> can I set these parameters so that M5 will use them after switching CPU >>>>>>>> mode? >>>>>>>> >>>>>>>> if options.detailed: >>>>>>>> CPUClass.commitWidth = 4 >>>>>>>> CPUClass.decodeWidth = 4 >>>>>>>> CPUClass.dispatchWidth = 4 >>>>>>>> CPUClass.fetchWidth = 4 >>>>>>>> CPUClass.issueWidth = 4 >>>>>>>> CPUClass.commitWidth = 4 >>>>>>>> CPUClass.renameWidth = 4 >>>>>>>> CPUClass.squashWidth = 4 >>>>>>>> CPUClass.wbWidth = 4 >>>>>>>> CPUClass.numROBEntries = 128 >>>>>>>> CPUClass.numIQEntries = 36 >>>>>>>> CPUClass.LQEntries = 48 >>>>>>>> >>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got >>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to >>>>>>>> figure out >>>>>>>> what the orphan node is. What should I do to find the orphan node? I >>>>>>>> tried >>>>>>>> "print self.name" in File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 822, in getCCObject, but got nothing. >>>>>>>> >>>>>>>> >>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench >>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache >>>>>>>> 2200 >>>>>>>> m5out/cpt.bzip2.2200 >>>>>>>> >>>>>>>> Global frequency set at 1000000000000 ticks per second >>>>>>>> Traceback (most recent call last): >>>>>>>> File "<string>", line 1, in ? >>>>>>>> File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", >>>>>>>> line 359, in main >>>>>>>> exec filecode in scope >>>>>>>> File "configs/example/se.py", line 179, in ? >>>>>>>> Simulation.run(options, root, system, FutureClass) >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", >>>>>>>> line 236, in run >>>>>>>> m5.instantiate(checkpoint_dir) >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", >>>>>>>> line 77, in instantiate >>>>>>>> for obj in root.descendants(): obj.createCCObject() >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 841, in createCCObject >>>>>>>> def createCCObject(self): >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 796, in getCCParams >>>>>>>> value = value.getValue() >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 845, in getValue >>>>>>>> def getValue(self): >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 826, in getCCObject >>>>>>>> self._ccObject = -1 >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 796, in getCCParams >>>>>>>> value = value.getValue() >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line >>>>>>>> 183, in getValue >>>>>>>> return [ v.getValue() for v in self ] >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 845, in getValue >>>>>>>> def getValue(self): >>>>>>>> File "/afs/ >>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line >>>>>>>> 822, in getCCObject >>>>>>>> #print self.name >>>>>>>> RuntimeError: Attempt to instantiate orphan node >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> -Sheng >>>>>>>> >>>>>>> >>>> >>> >> > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > -- Give our ability to our work, but our genius to our life!
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
