Hi, Steve, I will have a try and then provide my feedback.
Thanks, Leonard On Sun, Apr 10, 2011 at 7:05 PM, Steve Reinhardt <[email protected]> wrote: > Hi Leonard, > > I do have a patch for this issue, but haven't gotten around to pushing it > yet: > http://reviews.m5sim.org/r/608/ > > If you have the chance to download it from reviewboard and verify that > it solves your problem that would be helpful. > > Steve > > > On Thu, Apr 7, 2011 at 4:53 PM, Sage <[email protected]> wrote: > > Hi, Steve and Rick, > > > > You two discussed the problem of resuming a checkpoint when --l2cache and > > --detailed are both specified. But I noticed that the problem occurs when > > --l2cache and (--timing or --detailed) are specified but it won't happen > if > > only the "--l2cache" option is there. Have you figured out a way of > solving > > the problem? > > > > Thanks, > > Leonard > > > > > > > > On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt <[email protected]> > wrote: > >> > >> FYI, I finally got around to reproducing this, and I think I see what > the > >> problem is. Unfortunately I don't see a really trivial fix, but I've > got > >> some ideas I'll work on to see if I can take care of it. > >> > >> Steve > >> > >> On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt <[email protected]> > wrote: > >>> > >>> BTW, thanks for the detailed example... I've been traveling, but I'll > see > >>> if I can reproduce this when I get home. > >>> > >>> Steve > >>> > >>> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected] > > > >>> wrote: > >>>> > >>>> Here is the process I went through on a fresh checkout of m5 this > >>>> morning. > >>>> > >>>> (1) hg clone http://repo.m5sim.org/m5 > >>>> > >>>> (2) cd m5 > >>>> > >>>> (3) scons build/ALPHA_SE/m5.opt > >>>> > >>>> (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 > >>>> --at-instruction > >>>> > >>>> (5) build/ALPHA_SE/m5.opt configs/example/se.py > --checkpoint-restore=1 > >>>> --at-instruction -d --caches --l2cache > >>>> M5 Simulator System > >>>> > >>>> Copyright (c) 2001-2008 > >>>> The Regents of The University of Michigan > >>>> All Rights Reserved > >>>> > >>>> > >>>> M5 compiled Feb 17 2011 09:41:58 > >>>> M5 revision 96bde0910197+ 8031+ default tip > >>>> M5 started Feb 17 2011 09:54:32 > >>>> M5 executing on rstrong-desktop > >>>> command line: build/ALPHA_SE/m5.opt configs/example/se.py > >>>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache > >>>> Global frequency set at 1000000000000 ticks per second > >>>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port > 7000 > >>>> Switch at curTick count:10000 > >>>> info: Entering event queue @ 1000. Starting simulation... > >>>> panic: Tried to access unmapped address 0x12008b488. > >>>> @ cycle 2500 > >>>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] > >>>> Memory Usage: 586300 KBytes > >>>> For more information see: http://www.m5sim.org/panic/5932f339 > >>>> Program aborted at cycle 2500 > >>>> Aborted > >>>> > >>>> The problem seen in the output of (5) above is caused by the workload > >>>> being adopted by switch_cpus as its parent as opposed to system.cpu. > My > >>>> original fix was to modify simulate.py to adopt orphans in sorted > order, > >>>> but this appears to create orphans for fuPool as shown in the snippet > of > >>>> config.ini below. This makes me think that something is broken in the > design > >>>> as it depends on the order in which objects come up if certain objects > >>>> become orphans or if checkpoint files work. Is there any way to > explicitly > >>>> set the parent, child relationship if you want to avoid this non > >>>> determinism. > >>>> > >>>> config.ini selected output: > >>>> [system.switch_cpus.fuPool] > >>>> type=FUPool > >>>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) > >>>> (orphan) (orphan) > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]> > >>>> wrote: > >>>>> > >>>>> Hi Rick, > >>>>> > >>>>> I'm a little confused by your statement "there is no recursion to add > >>>>> the children of params". Being param value and being a child are > separate > >>>>> things, because an object A can be a param of many other objects but > can > >>>>> only be the child of one other object. The only relationship between > the > >>>>> two is that if A is set as a param value for a param of B and A does > not > >>>>> have a parent, then A will also implicitly be set as a child of B. > (See > >>>>> towards the end of SimObject.__setattr__().) > >>>>> > >>>>> So every SimObject param value *should* be the child of *some* > >>>>> SimObject, so iterating over param values shouldn't be necessary. > The whole > >>>>> point of adoptOrphanParams() is to make sure this is true; it's the > one > >>>>> place we iterate over all the param values, just to make sure that > they all > >>>>> have parents (and to set them if they don't). > >>>>> > >>>>> Also, the adoptOrphanParams() method traverses the whole tree (see > >>>>> simulate.py) using the descendants() call which is a pre-order > traversal, so > >>>>> any new children that are added at a particular node should be > traversed > >>>>> automatically. > >>>>> > >>>>> Your configuration should not be affected by whether you're restoring > >>>>> from a checkpoint or not... the config gets built first, then if > there's a > >>>>> checkpoint it gets restored. > >>>>> > >>>>> I rewrote all this code last summer to clean it up, so I'm very > >>>>> interested in figuring out where the bugs are. > >>>>> > >>>>> Steve > >>>>> > >>>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected] > > > >>>>> wrote: > >>>>>> > >>>>>> I took a close look at this problem because the same thing happens > to > >>>>>> me. It only occurs when I use the O3CPU model when resuming from a > >>>>>> checkpoint. What I find is that config.ini has orphan for the FUList > >>>>>> parameter of the O3CPU model. Further, none of the function units > are > >>>>>> adopted by fuPool. I think the problem lies in > SimObject.py::add_child(self, > >>>>>> name, child) and SimObject.py:: > >>>>>> adoptOrphanParams(self). I think that there is no recursion to add > the > >>>>>> children of params. I tried a simple change at the end of add_child, > that I > >>>>>> adoptOrphanParams() of the child (change showed below). This allows > the > >>>>>> setup code to get further but now I die with: > >>>>>> > >>>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I > was > >>>>>> wondering if someone knows what is going wrong? Did a recent change > forget > >>>>>> to go down enough recursive levels when adopting children nodes? > >>>>>> > >>>>>> Best, > >>>>>> -Rick > >>>>>> > >>>>>> def add_child(self, name, child): > >>>>>> print "\t in add_child name=%s child=%s"%(name, child) > >>>>>> child = coerceSimObjectOrVector(child) > >>>>>> if child.get_parent(): > >>>>>> raise RuntimeError, \ > >>>>>> "add_child('%s'): child '%s' already has parent > >>>>>> '%s'" % \ > >>>>>> (name, child._name, child._parent) > >>>>>> if self._children.has_key(name): > >>>>>> # This code path had an undiscovered bug that would make > >>>>>> it fail > >>>>>> # at runtime. It had been here for a long time and was > >>>>>> only > >>>>>> # exposed by a buggy script. Changes here will probably > >>>>>> not be > >>>>>> # exercised without specialized testing. > >>>>>> self.clear_child(name) > >>>>>> child.set_parent(self, name) > >>>>>> self._children[name] = child > >>>>>> if isSimObjectVector(child): > >>>>>> for obj in child: > >>>>>> obj.adoptOrphanParams() > >>>>>> elif isSimObjectOrVector(child): > >>>>>> child.adoptOrphanParams() > >>>>>>> > >>>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness > >>>>>>> <[email protected]> wrote: > >>>>>>>> > >>>>>>>> Hi Sheng, > >>>>>>>> I've dug back through some of my simulations, and I haven't been > >>>>>>>> able to find a case where I used 4GB of simulated memory, so I > don't know if > >>>>>>>> I have a baseline to show that the checkpoint restore works with > that much > >>>>>>>> memory. On the other hand, I have simulated with 512MB and 1GB of > simulated > >>>>>>>> memory, and it has worked fine. For full-system simulations, we > often mount > >>>>>>>> a swap disk in the simulated system in order to avoid the small > virtual > >>>>>>>> memory constraints imposed by the operating system. I'd have to > defer to > >>>>>>>> others on the list for knowledge about whether that would work > with SE mode. > >>>>>>>> I can attempt to address your other questions as well: > >>>>>>>> 1) The way that you described the O3 parameters is how I have > set > >>>>>>>> them in the past, so that should work. > >>>>>>>> 2) I've seen this problem before... It has had to do with the > way > >>>>>>>> that certain SimObjects are instantiated as children of other > SimObjects at > >>>>>>>> the beginning of the simulation, and with checkpoint restore, this > isn't the > >>>>>>>> cleanest process. When I ran into this problem, I was working on > getting > >>>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to > help me > >>>>>>>> debug. He might be able to suggest first steps for figuring out > what's > >>>>>>>> wrong here. > >>>>>>>> Hope this helps, > >>>>>>>> Joel > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> An two other questions: > >>>>>>>>> > >>>>>>>>> 1. What should I do to change the O3 parameters such as > issueWidth, > >>>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs > fine if I > >>>>>>>>> just run the benchmarks, but if I resume a checkpoint (created > without -d > >>>>>>>>> option), then it will complain the CPU class has no such > parameters. I think > >>>>>>>>> these parameters can only be set after M5 performs CPU mode > switch, then how > >>>>>>>>> can I set these parameters so that M5 will use them after > switching CPU > >>>>>>>>> mode? > >>>>>>>>> > >>>>>>>>> if options.detailed: > >>>>>>>>> CPUClass.commitWidth = 4 > >>>>>>>>> CPUClass.decodeWidth = 4 > >>>>>>>>> CPUClass.dispatchWidth = 4 > >>>>>>>>> CPUClass.fetchWidth = 4 > >>>>>>>>> CPUClass.issueWidth = 4 > >>>>>>>>> CPUClass.commitWidth = 4 > >>>>>>>>> CPUClass.renameWidth = 4 > >>>>>>>>> CPUClass.squashWidth = 4 > >>>>>>>>> CPUClass.wbWidth = 4 > >>>>>>>>> CPUClass.numROBEntries = 128 > >>>>>>>>> CPUClass.numIQEntries = 36 > >>>>>>>>> CPUClass.LQEntries = 48 > >>>>>>>>> > >>>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got > >>>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to > figure out > >>>>>>>>> what the orphan node is. What should I do to find the orphan > node? I tried > >>>>>>>>> "print self.name" in File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 822, in getCCObject, but got nothing. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py > --bench > >>>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache > >>>>>>>>> 2200 > >>>>>>>>> m5out/cpt.bzip2.2200 > >>>>>>>>> Global frequency set at 1000000000000 ticks per second > >>>>>>>>> Traceback (most recent call last): > >>>>>>>>> File "<string>", line 1, in ? > >>>>>>>>> File > >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", > line 359, in > >>>>>>>>> main > >>>>>>>>> exec filecode in scope > >>>>>>>>> File "configs/example/se.py", line 179, in ? > >>>>>>>>> Simulation.run(options, root, system, FutureClass) > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", > >>>>>>>>> line 236, in run > >>>>>>>>> m5.instantiate(checkpoint_dir) > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", line > >>>>>>>>> 77, in instantiate > >>>>>>>>> for obj in root.descendants(): obj.createCCObject() > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 841, in createCCObject > >>>>>>>>> def createCCObject(self): > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 796, in getCCParams > >>>>>>>>> value = value.getValue() > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 845, in getValue > >>>>>>>>> def getValue(self): > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 826, in getCCObject > >>>>>>>>> self._ccObject = -1 > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 796, in getCCParams > >>>>>>>>> value = value.getValue() > >>>>>>>>> File > >>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", > line 183, > >>>>>>>>> in getValue > >>>>>>>>> return [ v.getValue() for v in self ] > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 845, in getValue > >>>>>>>>> def getValue(self): > >>>>>>>>> File > >>>>>>>>> "/afs/ > crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line > >>>>>>>>> 822, in getCCObject > >>>>>>>>> #print self.name > >>>>>>>>> RuntimeError: Attempt to instantiate orphan node > >>>>>>>>> > >>>>>>>>> Thanks a lot! > >>>>>>>>> -Sheng > >>>>> > >>>> > >>> > >> > >> > >> _______________________________________________ > >> m5-users mailing list > >> [email protected] > >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > > > > > > -- > > Give our ability to our work, but our genius to our life! > > > > _______________________________________________ > > m5-users mailing list > > [email protected] > > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > -- Give our ability to our work, but our genius to our life!
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
