Ah, it's becoming clearer now... I had forgotten the details of our earlier
discussion about the order of objects in the adoptOrphanParams loop.

I'm pretty sure it's your sorted() fix that's causing the problem with
fuPool, and it is one of not recursing properly as you said.  As I mentioned
earlier, root.descendants() does a pre-order traversal so if you add things
to your own child list during the iteration then those added things will get
visited, but by turning that into sorted(root.descendants()) you cause the
whole tree to get walked before anything is added so that's why you miss
some of the nodes.  I'm glad to understand that better.

You can explicitly set A as a child of B simply by saying:
  B.name = A
where name is any name you want to give the child, as long as it doesn't
conflict with a param name.

As I mentioned at the end of our first discussion on this (around 1/25), I'm
not clear on why the workload is not getting parented for you, as the
example scripts should make that happen.  But I do think that this is the
right solution, to make sure the parent relationships are set up properly in
the script rather than leaving it to adoptOrphanParams().

Steve

On Thu, Feb 17, 2011 at 11:41 AM, Richard Strong <[email protected]>wrote:

> I forgot to mention that I was able to get around the orphan problem by
> changing descendats() to the code below. Basically, this causes cpu to occur
> before switch_cpus and the simulation goes through. This is not a general
> solution because if the switch_cpu is renamed to a_switch_cpu, then the
> workload will be attached to the wrong core again.
>
> -Rick
>
> def descendants(self):
>         #print "in descendants name=", self.get_name()
>         yield self
>
>         for child in sorted(self._children.itervalues(), key=lambda o:
> o.get_name()):
>             for obj in child.descendants():
>                 yield obj
>
>
>
> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong <[email protected]>wrote:
>
>> Here is the process I went through on a fresh checkout of m5 this morning.
>>
>> (1) hg clone http://repo.m5sim.org/m5
>>
>> (2) cd m5
>>
>> (3) scons build/ALPHA_SE/m5.opt
>>
>> (4) build/ALPHA_SE/m5.opt  configs/example/se.py  --take-checkpoint=1
>> --at-instruction
>>
>> (5) build/ALPHA_SE/m5.opt  configs/example/se.py  --checkpoint-restore=1
>> --at-instruction  -d --caches --l2cache
>> M5 Simulator System
>>
>> Copyright (c) 2001-2008
>> The Regents of The University of Michigan
>> All Rights Reserved
>>
>>
>> M5 compiled Feb 17 2011 09:41:58
>> M5 revision 96bde0910197+ 8031+ default tip
>> M5 started Feb 17 2011 09:54:32
>> M5 executing on rstrong-desktop
>> command line: build/ALPHA_SE/m5.opt configs/example/se.py
>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache
>>
>> Global frequency set at 1000000000000 ticks per second
>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
>> Switch at curTick count:10000
>> info: Entering event queue @ 1000.  Starting simulation...
>> panic: Tried to access unmapped address 0x12008b488.
>>  @ cycle 2500
>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208]
>> Memory Usage: 586300 KBytes
>> For more information see: http://www.m5sim.org/panic/5932f339
>> Program aborted at cycle 2500
>> Aborted
>>
>> The problem seen in the output of (5) above is caused by the workload
>> being adopted by switch_cpus as its parent as opposed to system.cpu. My
>> original fix was to modify simulate.py  to adopt orphans in sorted order,
>> but this appears to create orphans for fuPool as shown in the snippet of
>> config.ini below. This makes me think that something is broken in the design
>> as it depends on the order in which objects come up if certain objects
>> become orphans or if checkpoint files work. Is there any way to explicitly
>> set the parent, child relationship if you want to avoid this non
>> determinism.
>>
>> config.ini selected output:
>> [system.switch_cpus.fuPool]
>> type=FUPool
>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan)
>> (orphan) (orphan)
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt <[email protected]>wrote:
>>
>>> Hi Rick,
>>>
>>> I'm a little confused by your statement "there is no recursion to add the
>>> children of params".  Being param value and being a child are separate
>>> things, because an object A can be a param of many other objects but can
>>> only be the child of one other object.  The only relationship between the
>>> two is that if A is set as a param value for a param of B and A does not
>>> have a parent, then A will also implicitly be set as a child of B.  (See
>>> towards the end of SimObject.__setattr__().)
>>>
>>> So every SimObject param value *should* be the child of *some* SimObject,
>>> so iterating over param values shouldn't be necessary.  The whole point of
>>> adoptOrphanParams() is to make sure this is true; it's the one place we
>>> iterate over all the param values, just to make sure that they all have
>>> parents (and to set them if they don't).
>>>
>>> Also, the adoptOrphanParams() method traverses the whole tree (see
>>> simulate.py) using the descendants() call which is a pre-order traversal, so
>>> any new children that are added at a particular node should be traversed
>>> automatically.
>>>
>>> Your configuration should not be affected by whether you're restoring
>>> from a checkpoint or not... the config gets built first, then if there's a
>>> checkpoint it gets restored.
>>>
>>> I rewrote all this code last summer to clean it up, so I'm very
>>> interested in figuring out where the bugs are.
>>>
>>> Steve
>>>
>>>
>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong <[email protected]>wrote:
>>>
>>>> I took a close look at this problem because the same thing happens to
>>>> me. It only occurs when I use the O3CPU model when resuming from a
>>>> checkpoint. What I find is that config.ini has orphan for the FUList
>>>> parameter of the O3CPU model. Further, none of the function units are
>>>> adopted by fuPool. I think the problem lies in 
>>>> SimObject.py::add_child(self,
>>>> name, child) and SimObject.py::
>>>> adoptOrphanParams(self). I think that there is no recursion to add the
>>>> children of params. I tried a simple change at the end of add_child, that I
>>>> adoptOrphanParams() of the child (change showed below). This allows the
>>>> setup code to get further but now I die with:
>>>>
>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was
>>>> wondering if someone knows what is going wrong? Did a recent change forget
>>>> to go down enough recursive levels when adopting children nodes?
>>>>
>>>> Best,
>>>> -Rick
>>>>
>>>> def add_child(self, name, child):
>>>>         print "\t in add_child name=%s child=%s"%(name, child)
>>>>         child = coerceSimObjectOrVector(child)
>>>>         if child.get_parent():
>>>>             raise RuntimeError, \
>>>>                   "add_child('%s'): child '%s' already has parent '%s'"
>>>> % \
>>>>                   (name, child._name, child._parent)
>>>>         if self._children.has_key(name):
>>>>             # This code path had an undiscovered bug that would make it
>>>> fail
>>>>             # at runtime. It had been here for a long time and was only
>>>>             # exposed by a buggy script. Changes here will probably not
>>>> be
>>>>             # exercised without specialized testing.
>>>>             self.clear_child(name)
>>>>         child.set_parent(self, name)
>>>>         self._children[name] = child
>>>>         if isSimObjectVector(child):
>>>>             for obj in child:
>>>>                 obj.adoptOrphanParams()
>>>>         elif isSimObjectOrVector(child):
>>>>             child.adoptOrphanParams()
>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Sheng,
>>>>>>   I've dug back through some of my simulations, and I haven't been
>>>>>> able to find a case where I used 4GB of simulated memory, so I don't 
>>>>>> know if
>>>>>> I have a baseline to show that the checkpoint restore works with that 
>>>>>> much
>>>>>> memory.  On the other hand, I have simulated with 512MB and 1GB of 
>>>>>> simulated
>>>>>> memory, and it has worked fine.  For full-system simulations, we often 
>>>>>> mount
>>>>>> a swap disk in the simulated system in order to avoid the small virtual
>>>>>> memory constraints imposed by the operating system.  I'd have to defer to
>>>>>> others on the list for knowledge about whether that would work with SE 
>>>>>> mode.
>>>>>>   I can attempt to address your other questions as well:
>>>>>>    1) The way that you described the O3 parameters is how I have set
>>>>>> them in the past, so that should work.
>>>>>>    2) I've seen this problem before... It has had to do with the way
>>>>>> that certain SimObjects are instantiated as children of other SimObjects 
>>>>>> at
>>>>>> the beginning of the simulation, and with checkpoint restore, this isn't 
>>>>>> the
>>>>>> cleanest process.  When I ran into this problem, I was working on getting
>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to help me
>>>>>> debug.  He might be able to suggest first steps for figuring out what's
>>>>>> wrong here.
>>>>>>   Hope this helps,
>>>>>>   Joel
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li <[email protected]> wrote:
>>>>>>
>>>>>>> An two other questions:
>>>>>>>
>>>>>>> 1. What should I do to change the O3 parameters such as issueWidth,
>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs fine 
>>>>>>> if I
>>>>>>> just run the benchmarks, but if I resume a checkpoint (created without 
>>>>>>> -d
>>>>>>> option), then it will complain the CPU class has no such parameters. I 
>>>>>>> think
>>>>>>> these parameters can only be set after M5 performs CPU mode switch, 
>>>>>>> then how
>>>>>>> can I set these parameters so that M5 will use them after switching CPU
>>>>>>> mode?
>>>>>>>
>>>>>>>  if options.detailed:
>>>>>>>     CPUClass.commitWidth    = 4
>>>>>>>     CPUClass.decodeWidth    = 4
>>>>>>>     CPUClass.dispatchWidth  = 4
>>>>>>>     CPUClass.fetchWidth     = 4
>>>>>>>     CPUClass.issueWidth     = 4
>>>>>>>     CPUClass.commitWidth    = 4
>>>>>>>     CPUClass.renameWidth    = 4
>>>>>>>     CPUClass.squashWidth    = 4
>>>>>>>     CPUClass.wbWidth        = 4
>>>>>>>     CPUClass.numROBEntries  = 128
>>>>>>>     CPUClass.numIQEntries   = 36
>>>>>>>     CPUClass.LQEntries      = 48
>>>>>>>
>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got
>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to figure 
>>>>>>> out
>>>>>>> what the orphan node is. What should I do to find the orphan node? I 
>>>>>>> tried
>>>>>>> "print self.name" in File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 822, in getCCObject, but got nothing.
>>>>>>>
>>>>>>>
>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench
>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache
>>>>>>> 2200
>>>>>>> m5out/cpt.bzip2.2200
>>>>>>>
>>>>>>> Global frequency set at 1000000000000 ticks per second
>>>>>>>  Traceback (most recent call last):
>>>>>>>   File "<string>", line 1, in ?
>>>>>>>   File "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py",
>>>>>>> line 359, in main
>>>>>>>     exec filecode in scope
>>>>>>>   File "configs/example/se.py", line 179, in ?
>>>>>>>     Simulation.run(options, root, system, FutureClass)
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py",
>>>>>>> line 236, in run
>>>>>>>     m5.instantiate(checkpoint_dir)
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py",
>>>>>>> line 77, in instantiate
>>>>>>>     for obj in root.descendants(): obj.createCCObject()
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 841, in createCCObject
>>>>>>>     def createCCObject(self):
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 796, in getCCParams
>>>>>>>     value = value.getValue()
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 845, in getValue
>>>>>>>     def getValue(self):
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 826, in getCCObject
>>>>>>>     self._ccObject = -1
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 796, in getCCParams
>>>>>>>     value = value.getValue()
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line 183,
>>>>>>> in getValue
>>>>>>>     return [ v.getValue() for v in self ]
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 845, in getValue
>>>>>>>     def getValue(self):
>>>>>>>   File "/afs/
>>>>>>> crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
>>>>>>> 822, in getCCObject
>>>>>>>     #print self.name
>>>>>>> RuntimeError: Attempt to instantiate orphan node
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>> -Sheng
>>>>>>>
>>>>>>
>>>
>>
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to