Hi Leonard, I do have a patch for this issue, but haven't gotten around to pushing it yet: http://reviews.m5sim.org/r/608/
If you have the chance to download it from reviewboard and verify that it solves your problem that would be helpful. Steve On Thu, Apr 7, 2011 at 4:53 PM, Sage <[email protected]> wrote: > Hi, Steve and Rick, > > You two discussed the problem of resuming a checkpoint when --l2cache and > --detailed are both specified. But I noticed that the problem occurs when > --l2cache and (--timing or --detailed) are specified but it won't happen if > only the "--l2cache" option is there. Have you figured out a way of solving > the problem? > > Thanks, > Leonard > > > > On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt <[email protected]> wrote: >> >> FYI, I finally got around to reproducing this, and I think I see what the >> problem is. Unfortunately I don't see a really trivial fix, but I've got >> some ideas I'll work on to see if I can take care of it. >> >> Steve >> >> On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt <[email protected]> wrote: >>> >>> BTW, thanks for the detailed example... I've been traveling, but I'll see >>> if I can reproduce this when I get home. >>> >>> Steve >>> >>> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected]> >>> wrote: >>>> >>>> Here is the process I went through on a fresh checkout of m5 this >>>> morning. >>>> >>>> (1) hg clone http://repo.m5sim.org/m5 >>>> >>>> (2) cd m5 >>>> >>>> (3) scons build/ALPHA_SE/m5.opt >>>> >>>> (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 >>>> --at-instruction >>>> >>>> (5) build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 >>>> --at-instruction -d --caches --l2cache >>>> M5 Simulator System >>>> >>>> Copyright (c) 2001-2008 >>>> The Regents of The University of Michigan >>>> All Rights Reserved >>>> >>>> >>>> M5 compiled Feb 17 2011 09:41:58 >>>> M5 revision 96bde0910197+ 8031+ default tip >>>> M5 started Feb 17 2011 09:54:32 >>>> M5 executing on rstrong-desktop >>>> command line: build/ALPHA_SE/m5.opt configs/example/se.py >>>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache >>>> Global frequency set at 1000000000000 ticks per second >>>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 >>>> Switch at curTick count:10000 >>>> info: Entering event queue @ 1000. Starting simulation... >>>> panic: Tried to access unmapped address 0x12008b488. >>>> @ cycle 2500 >>>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] >>>> Memory Usage: 586300 KBytes >>>> For more information see: http://www.m5sim.org/panic/5932f339 >>>> Program aborted at cycle 2500 >>>> Aborted >>>> >>>> The problem seen in the output of (5) above is caused by the workload >>>> being adopted by switch_cpus as its parent as opposed to system.cpu. My >>>> original fix was to modify simulate.py to adopt orphans in sorted order, >>>> but this appears to create orphans for fuPool as shown in the snippet of >>>> config.ini below. This makes me think that something is broken in the >>>> design >>>> as it depends on the order in which objects come up if certain objects >>>> become orphans or if checkpoint files work. Is there any way to explicitly >>>> set the parent, child relationship if you want to avoid this non >>>> determinism. >>>> >>>> config.ini selected output: >>>> [system.switch_cpus.fuPool] >>>> type=FUPool >>>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) >>>> (orphan) (orphan) >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]> >>>> wrote: >>>>> >>>>> Hi Rick, >>>>> >>>>> I'm a little confused by your statement "there is no recursion to add >>>>> the children of params". Being param value and being a child are separate >>>>> things, because an object A can be a param of many other objects but can >>>>> only be the child of one other object. The only relationship between the >>>>> two is that if A is set as a param value for a param of B and A does not >>>>> have a parent, then A will also implicitly be set as a child of B. (See >>>>> towards the end of SimObject.__setattr__().) >>>>> >>>>> So every SimObject param value *should* be the child of *some* >>>>> SimObject, so iterating over param values shouldn't be necessary. The >>>>> whole >>>>> point of adoptOrphanParams() is to make sure this is true; it's the one >>>>> place we iterate over all the param values, just to make sure that they >>>>> all >>>>> have parents (and to set them if they don't). >>>>> >>>>> Also, the adoptOrphanParams() method traverses the whole tree (see >>>>> simulate.py) using the descendants() call which is a pre-order traversal, >>>>> so >>>>> any new children that are added at a particular node should be traversed >>>>> automatically. >>>>> >>>>> Your configuration should not be affected by whether you're restoring >>>>> from a checkpoint or not... the config gets built first, then if there's a >>>>> checkpoint it gets restored. >>>>> >>>>> I rewrote all this code last summer to clean it up, so I'm very >>>>> interested in figuring out where the bugs are. >>>>> >>>>> Steve >>>>> >>>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected]> >>>>> wrote: >>>>>> >>>>>> I took a close look at this problem because the same thing happens to >>>>>> me. It only occurs when I use the O3CPU model when resuming from a >>>>>> checkpoint. What I find is that config.ini has orphan for the FUList >>>>>> parameter of the O3CPU model. Further, none of the function units are >>>>>> adopted by fuPool. I think the problem lies in >>>>>> SimObject.py::add_child(self, >>>>>> name, child) and SimObject.py:: >>>>>> adoptOrphanParams(self). I think that there is no recursion to add the >>>>>> children of params. I tried a simple change at the end of add_child, >>>>>> that I >>>>>> adoptOrphanParams() of the child (change showed below). This allows the >>>>>> setup code to get further but now I die with: >>>>>> >>>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was >>>>>> wondering if someone knows what is going wrong? Did a recent change >>>>>> forget >>>>>> to go down enough recursive levels when adopting children nodes? >>>>>> >>>>>> Best, >>>>>> -Rick >>>>>> >>>>>> def add_child(self, name, child): >>>>>> print "\t in add_child name=%s child=%s"%(name, child) >>>>>> child = coerceSimObjectOrVector(child) >>>>>> if child.get_parent(): >>>>>> raise RuntimeError, \ >>>>>> "add_child('%s'): child '%s' already has parent >>>>>> '%s'" % \ >>>>>> (name, child._name, child._parent) >>>>>> if self._children.has_key(name): >>>>>> # This code path had an undiscovered bug that would make >>>>>> it fail >>>>>> # at runtime. It had been here for a long time and was >>>>>> only >>>>>> # exposed by a buggy script. Changes here will probably >>>>>> not be >>>>>> # exercised without specialized testing. >>>>>> self.clear_child(name) >>>>>> child.set_parent(self, name) >>>>>> self._children[name] = child >>>>>> if isSimObjectVector(child): >>>>>> for obj in child: >>>>>> obj.adoptOrphanParams() >>>>>> elif isSimObjectOrVector(child): >>>>>> child.adoptOrphanParams() >>>>>>> >>>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness >>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> Hi Sheng, >>>>>>>> I've dug back through some of my simulations, and I haven't been >>>>>>>> able to find a case where I used 4GB of simulated memory, so I don't >>>>>>>> know if >>>>>>>> I have a baseline to show that the checkpoint restore works with that >>>>>>>> much >>>>>>>> memory. On the other hand, I have simulated with 512MB and 1GB of >>>>>>>> simulated >>>>>>>> memory, and it has worked fine. For full-system simulations, we often >>>>>>>> mount >>>>>>>> a swap disk in the simulated system in order to avoid the small virtual >>>>>>>> memory constraints imposed by the operating system. I'd have to defer >>>>>>>> to >>>>>>>> others on the list for knowledge about whether that would work with SE >>>>>>>> mode. >>>>>>>> I can attempt to address your other questions as well: >>>>>>>> 1) The way that you described the O3 parameters is how I have set >>>>>>>> them in the past, so that should work. >>>>>>>> 2) I've seen this problem before... It has had to do with the way >>>>>>>> that certain SimObjects are instantiated as children of other >>>>>>>> SimObjects at >>>>>>>> the beginning of the simulation, and with checkpoint restore, this >>>>>>>> isn't the >>>>>>>> cleanest process. When I ran into this problem, I was working on >>>>>>>> getting >>>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to help >>>>>>>> me >>>>>>>> debug. He might be able to suggest first steps for figuring out what's >>>>>>>> wrong here. >>>>>>>> Hope this helps, >>>>>>>> Joel >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> An two other questions: >>>>>>>>> >>>>>>>>> 1. What should I do to change the O3 parameters such as issueWidth, >>>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs fine >>>>>>>>> if I >>>>>>>>> just run the benchmarks, but if I resume a checkpoint (created >>>>>>>>> without -d >>>>>>>>> option), then it will complain the CPU class has no such parameters. >>>>>>>>> I think >>>>>>>>> these parameters can only be set after M5 performs CPU mode switch, >>>>>>>>> then how >>>>>>>>> can I set these parameters so that M5 will use them after switching >>>>>>>>> CPU >>>>>>>>> mode? >>>>>>>>> >>>>>>>>> if options.detailed: >>>>>>>>> CPUClass.commitWidth = 4 >>>>>>>>> CPUClass.decodeWidth = 4 >>>>>>>>> CPUClass.dispatchWidth = 4 >>>>>>>>> CPUClass.fetchWidth = 4 >>>>>>>>> CPUClass.issueWidth = 4 >>>>>>>>> CPUClass.commitWidth = 4 >>>>>>>>> CPUClass.renameWidth = 4 >>>>>>>>> CPUClass.squashWidth = 4 >>>>>>>>> CPUClass.wbWidth = 4 >>>>>>>>> CPUClass.numROBEntries = 128 >>>>>>>>> CPUClass.numIQEntries = 36 >>>>>>>>> CPUClass.LQEntries = 48 >>>>>>>>> >>>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got >>>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to >>>>>>>>> figure out >>>>>>>>> what the orphan node is. What should I do to find the orphan node? I >>>>>>>>> tried >>>>>>>>> "print self.name" in File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 822, in getCCObject, but got nothing. >>>>>>>>> >>>>>>>>> >>>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench >>>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache >>>>>>>>> 2200 >>>>>>>>> m5out/cpt.bzip2.2200 >>>>>>>>> Global frequency set at 1000000000000 ticks per second >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File "<string>", line 1, in ? >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", line >>>>>>>>> 359, in >>>>>>>>> main >>>>>>>>> exec filecode in scope >>>>>>>>> File "configs/example/se.py", line 179, in ? >>>>>>>>> Simulation.run(options, root, system, FutureClass) >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", >>>>>>>>> line 236, in run >>>>>>>>> m5.instantiate(checkpoint_dir) >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", >>>>>>>>> line >>>>>>>>> 77, in instantiate >>>>>>>>> for obj in root.descendants(): obj.createCCObject() >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 841, in createCCObject >>>>>>>>> def createCCObject(self): >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 796, in getCCParams >>>>>>>>> value = value.getValue() >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 845, in getValue >>>>>>>>> def getValue(self): >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 826, in getCCObject >>>>>>>>> self._ccObject = -1 >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 796, in getCCParams >>>>>>>>> value = value.getValue() >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line >>>>>>>>> 183, >>>>>>>>> in getValue >>>>>>>>> return [ v.getValue() for v in self ] >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 845, in getValue >>>>>>>>> def getValue(self): >>>>>>>>> File >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>> line >>>>>>>>> 822, in getCCObject >>>>>>>>> #print self.name >>>>>>>>> RuntimeError: Attempt to instantiate orphan node >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> -Sheng >>>>> >>>> >>> >> >> >> _______________________________________________ >> m5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > -- > Give our ability to our work, but our genius to our life! > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
