I just pushed a patch that should fix it.... Ali
On Apr 10, 2011, at 7:13 PM, Griffin Wright wrote: > Is there any chance this patch also fixes the problems with checkpointing in > ARM_SE? :) > > I still haven't been able to fix the issues with checkpointing in atomic and > restoring to either atomic or timing, so I'm just poking my head out here to > see if anyone else has tackled the problem. > > -Griffin Wright > > On 4/10/2011 8:05 PM, Steve Reinhardt wrote: >> Hi Leonard, >> >> I do have a patch for this issue, but haven't gotten around to pushing it >> yet: >> http://reviews.m5sim.org/r/608/ >> >> If you have the chance to download it from reviewboard and verify that >> it solves your problem that would be helpful. >> >> Steve >> >> >> On Thu, Apr 7, 2011 at 4:53 PM, Sage<[email protected]> wrote: >>> Hi, Steve and Rick, >>> >>> You two discussed the problem of resuming a checkpoint when --l2cache and >>> --detailed are both specified. But I noticed that the problem occurs when >>> --l2cache and (--timing or --detailed) are specified but it won't happen if >>> only the "--l2cache" option is there. Have you figured out a way of solving >>> the problem? >>> >>> Thanks, >>> Leonard >>> >>> >>> >>> On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt<[email protected]> wrote: >>>> FYI, I finally got around to reproducing this, and I think I see what the >>>> problem is. Unfortunately I don't see a really trivial fix, but I've got >>>> some ideas I'll work on to see if I can take care of it. >>>> >>>> Steve >>>> >>>> On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt<[email protected]> wrote: >>>>> BTW, thanks for the detailed example... I've been traveling, but I'll see >>>>> if I can reproduce this when I get home. >>>>> >>>>> Steve >>>>> >>>>> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong<[email protected]> >>>>> wrote: >>>>>> Here is the process I went through on a fresh checkout of m5 this >>>>>> morning. >>>>>> >>>>>> (1) hg clone http://repo.m5sim.org/m5 >>>>>> >>>>>> (2) cd m5 >>>>>> >>>>>> (3) scons build/ALPHA_SE/m5.opt >>>>>> >>>>>> (4) build/ALPHA_SE/m5.opt configs/example/se.py --take-checkpoint=1 >>>>>> --at-instruction >>>>>> >>>>>> (5) build/ALPHA_SE/m5.opt configs/example/se.py --checkpoint-restore=1 >>>>>> --at-instruction -d --caches --l2cache >>>>>> M5 Simulator System >>>>>> >>>>>> Copyright (c) 2001-2008 >>>>>> The Regents of The University of Michigan >>>>>> All Rights Reserved >>>>>> >>>>>> >>>>>> M5 compiled Feb 17 2011 09:41:58 >>>>>> M5 revision 96bde0910197+ 8031+ default tip >>>>>> M5 started Feb 17 2011 09:54:32 >>>>>> M5 executing on rstrong-desktop >>>>>> command line: build/ALPHA_SE/m5.opt configs/example/se.py >>>>>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache >>>>>> Global frequency set at 1000000000000 ticks per second >>>>>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 >>>>>> Switch at curTick count:10000 >>>>>> info: Entering event queue @ 1000. Starting simulation... >>>>>> panic: Tried to access unmapped address 0x12008b488. >>>>>> @ cycle 2500 >>>>>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208] >>>>>> Memory Usage: 586300 KBytes >>>>>> For more information see: http://www.m5sim.org/panic/5932f339 >>>>>> Program aborted at cycle 2500 >>>>>> Aborted >>>>>> >>>>>> The problem seen in the output of (5) above is caused by the workload >>>>>> being adopted by switch_cpus as its parent as opposed to system.cpu. My >>>>>> original fix was to modify simulate.py to adopt orphans in sorted order, >>>>>> but this appears to create orphans for fuPool as shown in the snippet of >>>>>> config.ini below. This makes me think that something is broken in the >>>>>> design >>>>>> as it depends on the order in which objects come up if certain objects >>>>>> become orphans or if checkpoint files work. Is there any way to >>>>>> explicitly >>>>>> set the parent, child relationship if you want to avoid this non >>>>>> determinism. >>>>>> >>>>>> config.ini selected output: >>>>>> [system.switch_cpus.fuPool] >>>>>> type=FUPool >>>>>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan) >>>>>> (orphan) (orphan) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt<[email protected]> >>>>>> wrote: >>>>>>> Hi Rick, >>>>>>> >>>>>>> I'm a little confused by your statement "there is no recursion to add >>>>>>> the children of params". Being param value and being a child are >>>>>>> separate >>>>>>> things, because an object A can be a param of many other objects but can >>>>>>> only be the child of one other object. The only relationship between >>>>>>> the >>>>>>> two is that if A is set as a param value for a param of B and A does not >>>>>>> have a parent, then A will also implicitly be set as a child of B. (See >>>>>>> towards the end of SimObject.__setattr__().) >>>>>>> >>>>>>> So every SimObject param value *should* be the child of *some* >>>>>>> SimObject, so iterating over param values shouldn't be necessary. The >>>>>>> whole >>>>>>> point of adoptOrphanParams() is to make sure this is true; it's the one >>>>>>> place we iterate over all the param values, just to make sure that they >>>>>>> all >>>>>>> have parents (and to set them if they don't). >>>>>>> >>>>>>> Also, the adoptOrphanParams() method traverses the whole tree (see >>>>>>> simulate.py) using the descendants() call which is a pre-order >>>>>>> traversal, so >>>>>>> any new children that are added at a particular node should be traversed >>>>>>> automatically. >>>>>>> >>>>>>> Your configuration should not be affected by whether you're restoring >>>>>>> from a checkpoint or not... the config gets built first, then if >>>>>>> there's a >>>>>>> checkpoint it gets restored. >>>>>>> >>>>>>> I rewrote all this code last summer to clean it up, so I'm very >>>>>>> interested in figuring out where the bugs are. >>>>>>> >>>>>>> Steve >>>>>>> >>>>>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong<[email protected]> >>>>>>> wrote: >>>>>>>> I took a close look at this problem because the same thing happens to >>>>>>>> me. It only occurs when I use the O3CPU model when resuming from a >>>>>>>> checkpoint. What I find is that config.ini has orphan for the FUList >>>>>>>> parameter of the O3CPU model. Further, none of the function units are >>>>>>>> adopted by fuPool. I think the problem lies in >>>>>>>> SimObject.py::add_child(self, >>>>>>>> name, child) and SimObject.py:: >>>>>>>> adoptOrphanParams(self). I think that there is no recursion to add the >>>>>>>> children of params. I tried a simple change at the end of add_child, >>>>>>>> that I >>>>>>>> adoptOrphanParams() of the child (change showed below). This allows the >>>>>>>> setup code to get further but now I die with: >>>>>>>> >>>>>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was >>>>>>>> wondering if someone knows what is going wrong? Did a recent change >>>>>>>> forget >>>>>>>> to go down enough recursive levels when adopting children nodes? >>>>>>>> >>>>>>>> Best, >>>>>>>> -Rick >>>>>>>> >>>>>>>> def add_child(self, name, child): >>>>>>>> print "\t in add_child name=%s child=%s"%(name, child) >>>>>>>> child = coerceSimObjectOrVector(child) >>>>>>>> if child.get_parent(): >>>>>>>> raise RuntimeError, \ >>>>>>>> "add_child('%s'): child '%s' already has parent >>>>>>>> '%s'" % \ >>>>>>>> (name, child._name, child._parent) >>>>>>>> if self._children.has_key(name): >>>>>>>> # This code path had an undiscovered bug that would make >>>>>>>> it fail >>>>>>>> # at runtime. It had been here for a long time and was >>>>>>>> only >>>>>>>> # exposed by a buggy script. Changes here will probably >>>>>>>> not be >>>>>>>> # exercised without specialized testing. >>>>>>>> self.clear_child(name) >>>>>>>> child.set_parent(self, name) >>>>>>>> self._children[name] = child >>>>>>>> if isSimObjectVector(child): >>>>>>>> for obj in child: >>>>>>>> obj.adoptOrphanParams() >>>>>>>> elif isSimObjectOrVector(child): >>>>>>>> child.adoptOrphanParams() >>>>>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Hi Sheng, >>>>>>>>>> I've dug back through some of my simulations, and I haven't been >>>>>>>>>> able to find a case where I used 4GB of simulated memory, so I don't >>>>>>>>>> know if >>>>>>>>>> I have a baseline to show that the checkpoint restore works with >>>>>>>>>> that much >>>>>>>>>> memory. On the other hand, I have simulated with 512MB and 1GB of >>>>>>>>>> simulated >>>>>>>>>> memory, and it has worked fine. For full-system simulations, we >>>>>>>>>> often mount >>>>>>>>>> a swap disk in the simulated system in order to avoid the small >>>>>>>>>> virtual >>>>>>>>>> memory constraints imposed by the operating system. I'd have to >>>>>>>>>> defer to >>>>>>>>>> others on the list for knowledge about whether that would work with >>>>>>>>>> SE mode. >>>>>>>>>> I can attempt to address your other questions as well: >>>>>>>>>> 1) The way that you described the O3 parameters is how I have set >>>>>>>>>> them in the past, so that should work. >>>>>>>>>> 2) I've seen this problem before... It has had to do with the way >>>>>>>>>> that certain SimObjects are instantiated as children of other >>>>>>>>>> SimObjects at >>>>>>>>>> the beginning of the simulation, and with checkpoint restore, this >>>>>>>>>> isn't the >>>>>>>>>> cleanest process. When I ran into this problem, I was working on >>>>>>>>>> getting >>>>>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to >>>>>>>>>> help me >>>>>>>>>> debug. He might be able to suggest first steps for figuring out >>>>>>>>>> what's >>>>>>>>>> wrong here. >>>>>>>>>> Hope this helps, >>>>>>>>>> Joel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li<[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> An two other questions: >>>>>>>>>>> >>>>>>>>>>> 1. What should I do to change the O3 parameters such as issueWidth, >>>>>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs >>>>>>>>>>> fine if I >>>>>>>>>>> just run the benchmarks, but if I resume a checkpoint (created >>>>>>>>>>> without -d >>>>>>>>>>> option), then it will complain the CPU class has no such >>>>>>>>>>> parameters. I think >>>>>>>>>>> these parameters can only be set after M5 performs CPU mode switch, >>>>>>>>>>> then how >>>>>>>>>>> can I set these parameters so that M5 will use them after switching >>>>>>>>>>> CPU >>>>>>>>>>> mode? >>>>>>>>>>> >>>>>>>>>>> if options.detailed: >>>>>>>>>>> CPUClass.commitWidth = 4 >>>>>>>>>>> CPUClass.decodeWidth = 4 >>>>>>>>>>> CPUClass.dispatchWidth = 4 >>>>>>>>>>> CPUClass.fetchWidth = 4 >>>>>>>>>>> CPUClass.issueWidth = 4 >>>>>>>>>>> CPUClass.commitWidth = 4 >>>>>>>>>>> CPUClass.renameWidth = 4 >>>>>>>>>>> CPUClass.squashWidth = 4 >>>>>>>>>>> CPUClass.wbWidth = 4 >>>>>>>>>>> CPUClass.numROBEntries = 128 >>>>>>>>>>> CPUClass.numIQEntries = 36 >>>>>>>>>>> CPUClass.LQEntries = 48 >>>>>>>>>>> >>>>>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got >>>>>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to >>>>>>>>>>> figure out >>>>>>>>>>> what the orphan node is. What should I do to find the orphan node? >>>>>>>>>>> I tried >>>>>>>>>>> "print self.name" in File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 822, in getCCObject, but got nothing. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench >>>>>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache >>>>>>>>>>> 2200 >>>>>>>>>>> m5out/cpt.bzip2.2200 >>>>>>>>>>> Global frequency set at 1000000000000 ticks per second >>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>> File "<string>", line 1, in ? >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", line >>>>>>>>>>> 359, in >>>>>>>>>>> main >>>>>>>>>>> exec filecode in scope >>>>>>>>>>> File "configs/example/se.py", line 179, in ? >>>>>>>>>>> Simulation.run(options, root, system, FutureClass) >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py", >>>>>>>>>>> line 236, in run >>>>>>>>>>> m5.instantiate(checkpoint_dir) >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", >>>>>>>>>>> line >>>>>>>>>>> 77, in instantiate >>>>>>>>>>> for obj in root.descendants(): obj.createCCObject() >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 841, in createCCObject >>>>>>>>>>> def createCCObject(self): >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 796, in getCCParams >>>>>>>>>>> value = value.getValue() >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 845, in getValue >>>>>>>>>>> def getValue(self): >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 826, in getCCObject >>>>>>>>>>> self._ccObject = -1 >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 796, in getCCParams >>>>>>>>>>> value = value.getValue() >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", >>>>>>>>>>> line 183, >>>>>>>>>>> in getValue >>>>>>>>>>> return [ v.getValue() for v in self ] >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 845, in getValue >>>>>>>>>>> def getValue(self): >>>>>>>>>>> File >>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", >>>>>>>>>>> line >>>>>>>>>>> 822, in getCCObject >>>>>>>>>>> #print self.name >>>>>>>>>>> RuntimeError: Attempt to instantiate orphan node >>>>>>>>>>> >>>>>>>>>>> Thanks a lot! >>>>>>>>>>> -Sheng >>>> >>>> _______________________________________________ >>>> m5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >>> >>> -- >>> Give our ability to our work, but our genius to our life! >>> >>> _______________________________________________ >>> m5-users mailing list >>> [email protected] >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >> _______________________________________________ >> m5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >> >> > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
