Hi Leonard,

I do have a patch for this issue, but haven't gotten around to pushing it yet:
http://reviews.m5sim.org/r/608/

If you have the chance to download it from reviewboard and verify that
it solves your problem that would be helpful.

Steve


On Thu, Apr 7, 2011 at 4:53 PM, Sage <[email protected]> wrote:
> Hi, Steve and Rick,
>
> You two discussed the problem of resuming a checkpoint when --l2cache and
> --detailed are both specified. But I noticed that the problem occurs when
> --l2cache and (--timing or --detailed) are specified but it won't happen if
> only the "--l2cache" option is there. Have you figured out a way of solving
> the problem?
>
> Thanks,
> Leonard
>
>
>
> On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt <[email protected]> wrote:
>>
>> FYI, I finally got around to reproducing this, and I think I see what the
>> problem is.  Unfortunately I don't see a really trivial fix, but I've got
>> some ideas I'll work on to see if I can take care of it.
>>
>> Steve
>>
>> On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt <[email protected]> wrote:
>>>
>>> BTW, thanks for the detailed example... I've been traveling, but I'll see
>>> if I can reproduce this when I get home.
>>>
>>> Steve
>>>
>>> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected]>
>>> wrote:
>>>>
>>>> Here is the process I went through on a fresh checkout of m5 this
>>>> morning.
>>>>
>>>> (1) hg clone http://repo.m5sim.org/m5
>>>>
>>>> (2) cd m5
>>>>
>>>> (3) scons build/ALPHA_SE/m5.opt
>>>>
>>>> (4) build/ALPHA_SE/m5.opt  configs/example/se.py  --take-checkpoint=1
>>>> --at-instruction
>>>>
>>>> (5) build/ALPHA_SE/m5.opt  configs/example/se.py  --checkpoint-restore=1
>>>> --at-instruction  -d --caches --l2cache
>>>> M5 Simulator System
>>>>
>>>> Copyright (c) 2001-2008
>>>> The Regents of The University of Michigan
>>>> All Rights Reserved
>>>>
>>>>
>>>> M5 compiled Feb 17 2011 09:41:58
>>>> M5 revision 96bde0910197+ 8031+ default tip
>>>> M5 started Feb 17 2011 09:54:32
>>>> M5 executing on rstrong-desktop
>>>> command line: build/ALPHA_SE/m5.opt configs/example/se.py
>>>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache
>>>> Global frequency set at 1000000000000 ticks per second
>>>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
>>>> Switch at curTick count:10000
>>>> info: Entering event queue @ 1000.  Starting simulation...
>>>> panic: Tried to access unmapped address 0x12008b488.
>>>>  @ cycle 2500
>>>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208]
>>>> Memory Usage: 586300 KBytes
>>>> For more information see: http://www.m5sim.org/panic/5932f339
>>>> Program aborted at cycle 2500
>>>> Aborted
>>>>
>>>> The problem seen in the output of (5) above is caused by the workload
>>>> being adopted by switch_cpus as its parent as opposed to system.cpu. My
>>>> original fix was to modify simulate.py  to adopt orphans in sorted order,
>>>> but this appears to create orphans for fuPool as shown in the snippet of
>>>> config.ini below. This makes me think that something is broken in the 
>>>> design
>>>> as it depends on the order in which objects come up if certain objects
>>>> become orphans or if checkpoint files work. Is there any way to explicitly
>>>> set the parent, child relationship if you want to avoid this non
>>>> determinism.
>>>>
>>>> config.ini selected output:
>>>> [system.switch_cpus.fuPool]
>>>> type=FUPool
>>>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan)
>>>> (orphan) (orphan)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]>
>>>> wrote:
>>>>>
>>>>> Hi Rick,
>>>>>
>>>>> I'm a little confused by your statement "there is no recursion to add
>>>>> the children of params".  Being param value and being a child are separate
>>>>> things, because an object A can be a param of many other objects but can
>>>>> only be the child of one other object.  The only relationship between the
>>>>> two is that if A is set as a param value for a param of B and A does not
>>>>> have a parent, then A will also implicitly be set as a child of B.  (See
>>>>> towards the end of SimObject.__setattr__().)
>>>>>
>>>>> So every SimObject param value *should* be the child of *some*
>>>>> SimObject, so iterating over param values shouldn't be necessary.  The 
>>>>> whole
>>>>> point of adoptOrphanParams() is to make sure this is true; it's the one
>>>>> place we iterate over all the param values, just to make sure that they 
>>>>> all
>>>>> have parents (and to set them if they don't).
>>>>>
>>>>> Also, the adoptOrphanParams() method traverses the whole tree (see
>>>>> simulate.py) using the descendants() call which is a pre-order traversal, 
>>>>> so
>>>>> any new children that are added at a particular node should be traversed
>>>>> automatically.
>>>>>
>>>>> Your configuration should not be affected by whether you're restoring
>>>>> from a checkpoint or not... the config gets built first, then if there's a
>>>>> checkpoint it gets restored.
>>>>>
>>>>> I rewrote all this code last summer to clean it up, so I'm very
>>>>> interested in figuring out where the bugs are.
>>>>>
>>>>> Steve
>>>>>
>>>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> I took a close look at this problem because the same thing happens to
>>>>>> me. It only occurs when I use the O3CPU model when resuming from a
>>>>>> checkpoint. What I find is that config.ini has orphan for the FUList
>>>>>> parameter of the O3CPU model. Further, none of the function units are
>>>>>> adopted by fuPool. I think the problem lies in 
>>>>>> SimObject.py::add_child(self,
>>>>>> name, child) and SimObject.py::
>>>>>> adoptOrphanParams(self). I think that there is no recursion to add the
>>>>>> children of params. I tried a simple change at the end of add_child, 
>>>>>> that I
>>>>>> adoptOrphanParams() of the child (change showed below). This allows the
>>>>>> setup code to get further but now I die with:
>>>>>>
>>>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was
>>>>>> wondering if someone knows what is going wrong? Did a recent change 
>>>>>> forget
>>>>>> to go down enough recursive levels when adopting children nodes?
>>>>>>
>>>>>> Best,
>>>>>> -Rick
>>>>>>
>>>>>> def add_child(self, name, child):
>>>>>>         print "\t in add_child name=%s child=%s"%(name, child)
>>>>>>         child = coerceSimObjectOrVector(child)
>>>>>>         if child.get_parent():
>>>>>>             raise RuntimeError, \
>>>>>>                   "add_child('%s'): child '%s' already has parent
>>>>>> '%s'" % \
>>>>>>                   (name, child._name, child._parent)
>>>>>>         if self._children.has_key(name):
>>>>>>             # This code path had an undiscovered bug that would make
>>>>>> it fail
>>>>>>             # at runtime. It had been here for a long time and was
>>>>>> only
>>>>>>             # exposed by a buggy script. Changes here will probably
>>>>>> not be
>>>>>>             # exercised without specialized testing.
>>>>>>             self.clear_child(name)
>>>>>>         child.set_parent(self, name)
>>>>>>         self._children[name] = child
>>>>>>         if isSimObjectVector(child):
>>>>>>             for obj in child:
>>>>>>                 obj.adoptOrphanParams()
>>>>>>         elif isSimObjectOrVector(child):
>>>>>>             child.adoptOrphanParams()
>>>>>>>
>>>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi Sheng,
>>>>>>>>   I've dug back through some of my simulations, and I haven't been
>>>>>>>> able to find a case where I used 4GB of simulated memory, so I don't 
>>>>>>>> know if
>>>>>>>> I have a baseline to show that the checkpoint restore works with that 
>>>>>>>> much
>>>>>>>> memory.  On the other hand, I have simulated with 512MB and 1GB of 
>>>>>>>> simulated
>>>>>>>> memory, and it has worked fine.  For full-system simulations, we often 
>>>>>>>> mount
>>>>>>>> a swap disk in the simulated system in order to avoid the small virtual
>>>>>>>> memory constraints imposed by the operating system.  I'd have to defer 
>>>>>>>> to
>>>>>>>> others on the list for knowledge about whether that would work with SE 
>>>>>>>> mode.
>>>>>>>>   I can attempt to address your other questions as well:
>>>>>>>>    1) The way that you described the O3 parameters is how I have set
>>>>>>>> them in the past, so that should work.
>>>>>>>>    2) I've seen this problem before... It has had to do with the way
>>>>>>>> that certain SimObjects are instantiated as children of other 
>>>>>>>> SimObjects at
>>>>>>>> the beginning of the simulation, and with checkpoint restore, this 
>>>>>>>> isn't the
>>>>>>>> cleanest process.  When I ran into this problem, I was working on 
>>>>>>>> getting
>>>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to help 
>>>>>>>> me
>>>>>>>> debug.  He might be able to suggest first steps for figuring out what's
>>>>>>>> wrong here.
>>>>>>>>   Hope this helps,
>>>>>>>>   Joel
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> An two other questions:
>>>>>>>>>
>>>>>>>>> 1. What should I do to change the O3 parameters such as issueWidth,
>>>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs fine 
>>>>>>>>> if I
>>>>>>>>> just run the benchmarks, but if I resume a checkpoint (created 
>>>>>>>>> without -d
>>>>>>>>> option), then it will complain the CPU class has no such parameters. 
>>>>>>>>> I think
>>>>>>>>> these parameters can only be set after M5 performs CPU mode switch, 
>>>>>>>>> then how
>>>>>>>>> can I set these parameters so that M5 will use them after switching 
>>>>>>>>> CPU
>>>>>>>>> mode?
>>>>>>>>>
>>>>>>>>>  if options.detailed:
>>>>>>>>>     CPUClass.commitWidth    = 4
>>>>>>>>>     CPUClass.decodeWidth    = 4
>>>>>>>>>     CPUClass.dispatchWidth  = 4
>>>>>>>>>     CPUClass.fetchWidth     = 4
>>>>>>>>>     CPUClass.issueWidth     = 4
>>>>>>>>>     CPUClass.commitWidth    = 4
>>>>>>>>>     CPUClass.renameWidth    = 4
>>>>>>>>>     CPUClass.squashWidth    = 4
>>>>>>>>>     CPUClass.wbWidth        = 4
>>>>>>>>>     CPUClass.numROBEntries  = 128
>>>>>>>>>     CPUClass.numIQEntries   = 36
>>>>>>>>>     CPUClass.LQEntries      = 48
>>>>>>>>>
>>>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got
>>>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to 
>>>>>>>>> figure out
>>>>>>>>> what the orphan node is. What should I do to find the orphan node? I 
>>>>>>>>> tried
>>>>>>>>> "print self.name" in File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 822, in getCCObject, but got nothing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench
>>>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache
>>>>>>>>> 2200
>>>>>>>>> m5out/cpt.bzip2.2200
>>>>>>>>> Global frequency set at 1000000000000 ticks per second
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>   File "<string>", line 1, in ?
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", line 
>>>>>>>>> 359, in
>>>>>>>>> main
>>>>>>>>>     exec filecode in scope
>>>>>>>>>   File "configs/example/se.py", line 179, in ?
>>>>>>>>>     Simulation.run(options, root, system, FutureClass)
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py",
>>>>>>>>> line 236, in run
>>>>>>>>>     m5.instantiate(checkpoint_dir)
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py",
>>>>>>>>>  line
>>>>>>>>> 77, in instantiate
>>>>>>>>>     for obj in root.descendants(): obj.createCCObject()
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 841, in createCCObject
>>>>>>>>>     def createCCObject(self):
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 796, in getCCParams
>>>>>>>>>     value = value.getValue()
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 845, in getValue
>>>>>>>>>     def getValue(self):
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 826, in getCCObject
>>>>>>>>>     self._ccObject = -1
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 796, in getCCParams
>>>>>>>>>     value = value.getValue()
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line 
>>>>>>>>> 183,
>>>>>>>>> in getValue
>>>>>>>>>     return [ v.getValue() for v in self ]
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 845, in getValue
>>>>>>>>>     def getValue(self):
>>>>>>>>>   File
>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>> line
>>>>>>>>> 822, in getCCObject
>>>>>>>>>     #print self.name
>>>>>>>>> RuntimeError: Attempt to instantiate orphan node
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>> -Sheng
>>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
>
>
> --
> Give our ability to our work, but our genius to our life!
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to