Doubtful... this patch really only addresses the problem of restoring
into a configuration with switch_cpus, and doesn't do anything
specific to ARM.

Steve

On Sun, Apr 10, 2011 at 5:13 PM, Griffin Wright <[email protected]> wrote:
> Is there any chance this patch also fixes the problems with checkpointing in
> ARM_SE? :)
>
> I still haven't been able to fix the issues with checkpointing in atomic and
> restoring to either atomic or timing, so I'm just poking my head out here to
> see if anyone else has tackled the problem.
>
> -Griffin Wright
>
> On 4/10/2011 8:05 PM, Steve Reinhardt wrote:
>>
>> Hi Leonard,
>>
>> I do have a patch for this issue, but haven't gotten around to pushing it
>> yet:
>> http://reviews.m5sim.org/r/608/
>>
>> If you have the chance to download it from reviewboard and verify that
>> it solves your problem that would be helpful.
>>
>> Steve
>>
>>
>> On Thu, Apr 7, 2011 at 4:53 PM, Sage<[email protected]>  wrote:
>>>
>>> Hi, Steve and Rick,
>>>
>>> You two discussed the problem of resuming a checkpoint when --l2cache and
>>> --detailed are both specified. But I noticed that the problem occurs when
>>> --l2cache and (--timing or --detailed) are specified but it won't happen
>>> if
>>> only the "--l2cache" option is there. Have you figured out a way of
>>> solving
>>> the problem?
>>>
>>> Thanks,
>>> Leonard
>>>
>>>
>>>
>>> On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt<[email protected]>  wrote:
>>>>
>>>> FYI, I finally got around to reproducing this, and I think I see what
>>>> the
>>>> problem is.  Unfortunately I don't see a really trivial fix, but I've
>>>> got
>>>> some ideas I'll work on to see if I can take care of it.
>>>>
>>>> Steve
>>>>
>>>> On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt<[email protected]>
>>>>  wrote:
>>>>>
>>>>> BTW, thanks for the detailed example... I've been traveling, but I'll
>>>>> see
>>>>> if I can reproduce this when I get home.
>>>>>
>>>>> Steve
>>>>>
>>>>> On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong<[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Here is the process I went through on a fresh checkout of m5 this
>>>>>> morning.
>>>>>>
>>>>>> (1) hg clone http://repo.m5sim.org/m5
>>>>>>
>>>>>> (2) cd m5
>>>>>>
>>>>>> (3) scons build/ALPHA_SE/m5.opt
>>>>>>
>>>>>> (4) build/ALPHA_SE/m5.opt  configs/example/se.py  --take-checkpoint=1
>>>>>> --at-instruction
>>>>>>
>>>>>> (5) build/ALPHA_SE/m5.opt  configs/example/se.py
>>>>>>  --checkpoint-restore=1
>>>>>> --at-instruction  -d --caches --l2cache
>>>>>> M5 Simulator System
>>>>>>
>>>>>> Copyright (c) 2001-2008
>>>>>> The Regents of The University of Michigan
>>>>>> All Rights Reserved
>>>>>>
>>>>>>
>>>>>> M5 compiled Feb 17 2011 09:41:58
>>>>>> M5 revision 96bde0910197+ 8031+ default tip
>>>>>> M5 started Feb 17 2011 09:54:32
>>>>>> M5 executing on rstrong-desktop
>>>>>> command line: build/ALPHA_SE/m5.opt configs/example/se.py
>>>>>> --checkpoint-restore=1 --at-instruction -d --caches --l2cache
>>>>>> Global frequency set at 1000000000000 ticks per second
>>>>>> 0: system.remote_gdb.listener: listening for remote gdb #0 on port
>>>>>> 7000
>>>>>> Switch at curTick count:10000
>>>>>> info: Entering event queue @ 1000.  Starting simulation...
>>>>>> panic: Tried to access unmapped address 0x12008b488.
>>>>>>  @ cycle 2500
>>>>>> [invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208]
>>>>>> Memory Usage: 586300 KBytes
>>>>>> For more information see: http://www.m5sim.org/panic/5932f339
>>>>>> Program aborted at cycle 2500
>>>>>> Aborted
>>>>>>
>>>>>> The problem seen in the output of (5) above is caused by the workload
>>>>>> being adopted by switch_cpus as its parent as opposed to system.cpu.
>>>>>> My
>>>>>> original fix was to modify simulate.py  to adopt orphans in sorted
>>>>>> order,
>>>>>> but this appears to create orphans for fuPool as shown in the snippet
>>>>>> of
>>>>>> config.ini below. This makes me think that something is broken in the
>>>>>> design
>>>>>> as it depends on the order in which objects come up if certain objects
>>>>>> become orphans or if checkpoint files work. Is there any way to
>>>>>> explicitly
>>>>>> set the parent, child relationship if you want to avoid this non
>>>>>> determinism.
>>>>>>
>>>>>> config.ini selected output:
>>>>>> [system.switch_cpus.fuPool]
>>>>>> type=FUPool
>>>>>> FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan)
>>>>>> (orphan) (orphan)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt<[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Rick,
>>>>>>>
>>>>>>> I'm a little confused by your statement "there is no recursion to add
>>>>>>> the children of params".  Being param value and being a child are
>>>>>>> separate
>>>>>>> things, because an object A can be a param of many other objects but
>>>>>>> can
>>>>>>> only be the child of one other object.  The only relationship between
>>>>>>> the
>>>>>>> two is that if A is set as a param value for a param of B and A does
>>>>>>> not
>>>>>>> have a parent, then A will also implicitly be set as a child of B.
>>>>>>>  (See
>>>>>>> towards the end of SimObject.__setattr__().)
>>>>>>>
>>>>>>> So every SimObject param value *should* be the child of *some*
>>>>>>> SimObject, so iterating over param values shouldn't be necessary.
>>>>>>>  The whole
>>>>>>> point of adoptOrphanParams() is to make sure this is true; it's the
>>>>>>> one
>>>>>>> place we iterate over all the param values, just to make sure that
>>>>>>> they all
>>>>>>> have parents (and to set them if they don't).
>>>>>>>
>>>>>>> Also, the adoptOrphanParams() method traverses the whole tree (see
>>>>>>> simulate.py) using the descendants() call which is a pre-order
>>>>>>> traversal, so
>>>>>>> any new children that are added at a particular node should be
>>>>>>> traversed
>>>>>>> automatically.
>>>>>>>
>>>>>>> Your configuration should not be affected by whether you're restoring
>>>>>>> from a checkpoint or not... the config gets built first, then if
>>>>>>> there's a
>>>>>>> checkpoint it gets restored.
>>>>>>>
>>>>>>> I rewrote all this code last summer to clean it up, so I'm very
>>>>>>> interested in figuring out where the bugs are.
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong<[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I took a close look at this problem because the same thing happens
>>>>>>>> to
>>>>>>>> me. It only occurs when I use the O3CPU model when resuming from a
>>>>>>>> checkpoint. What I find is that config.ini has orphan for the FUList
>>>>>>>> parameter of the O3CPU model. Further, none of the function units
>>>>>>>> are
>>>>>>>> adopted by fuPool. I think the problem lies in
>>>>>>>> SimObject.py::add_child(self,
>>>>>>>> name, child) and SimObject.py::
>>>>>>>> adoptOrphanParams(self). I think that there is no recursion to add
>>>>>>>> the
>>>>>>>> children of params. I tried a simple change at the end of add_child,
>>>>>>>> that I
>>>>>>>> adoptOrphanParams() of the child (change showed below). This allows
>>>>>>>> the
>>>>>>>> setup code to get further but now I die with:
>>>>>>>>
>>>>>>>> "AttributeError: 'AnyProxy' object has no attribute 'getValue'. I
>>>>>>>> was
>>>>>>>> wondering if someone knows what is going wrong? Did a recent change
>>>>>>>> forget
>>>>>>>> to go down enough recursive levels when adopting children nodes?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> -Rick
>>>>>>>>
>>>>>>>> def add_child(self, name, child):
>>>>>>>>         print "\t in add_child name=%s child=%s"%(name, child)
>>>>>>>>         child = coerceSimObjectOrVector(child)
>>>>>>>>         if child.get_parent():
>>>>>>>>             raise RuntimeError, \
>>>>>>>>                   "add_child('%s'): child '%s' already has parent
>>>>>>>> '%s'" % \
>>>>>>>>                   (name, child._name, child._parent)
>>>>>>>>         if self._children.has_key(name):
>>>>>>>>             # This code path had an undiscovered bug that would make
>>>>>>>> it fail
>>>>>>>>             # at runtime. It had been here for a long time and was
>>>>>>>> only
>>>>>>>>             # exposed by a buggy script. Changes here will probably
>>>>>>>> not be
>>>>>>>>             # exercised without specialized testing.
>>>>>>>>             self.clear_child(name)
>>>>>>>>         child.set_parent(self, name)
>>>>>>>>         self._children[name] = child
>>>>>>>>         if isSimObjectVector(child):
>>>>>>>>             for obj in child:
>>>>>>>>                 obj.adoptOrphanParams()
>>>>>>>>         elif isSimObjectOrVector(child):
>>>>>>>>             child.adoptOrphanParams()
>>>>>>>>>
>>>>>>>>> On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness
>>>>>>>>> <[email protected]>  wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sheng,
>>>>>>>>>>   I've dug back through some of my simulations, and I haven't been
>>>>>>>>>> able to find a case where I used 4GB of simulated memory, so I
>>>>>>>>>> don't know if
>>>>>>>>>> I have a baseline to show that the checkpoint restore works with
>>>>>>>>>> that much
>>>>>>>>>> memory.  On the other hand, I have simulated with 512MB and 1GB of
>>>>>>>>>> simulated
>>>>>>>>>> memory, and it has worked fine.  For full-system simulations, we
>>>>>>>>>> often mount
>>>>>>>>>> a swap disk in the simulated system in order to avoid the small
>>>>>>>>>> virtual
>>>>>>>>>> memory constraints imposed by the operating system.  I'd have to
>>>>>>>>>> defer to
>>>>>>>>>> others on the list for knowledge about whether that would work
>>>>>>>>>> with SE mode.
>>>>>>>>>>   I can attempt to address your other questions as well:
>>>>>>>>>>    1) The way that you described the O3 parameters is how I have
>>>>>>>>>> set
>>>>>>>>>> them in the past, so that should work.
>>>>>>>>>>    2) I've seen this problem before... It has had to do with the
>>>>>>>>>> way
>>>>>>>>>> that certain SimObjects are instantiated as children of other
>>>>>>>>>> SimObjects at
>>>>>>>>>> the beginning of the simulation, and with checkpoint restore, this
>>>>>>>>>> isn't the
>>>>>>>>>> cleanest process.  When I ran into this problem, I was working on
>>>>>>>>>> getting
>>>>>>>>>> x86 timing mode working with Ruby, and Brad Beckmann was able to
>>>>>>>>>> help me
>>>>>>>>>> debug.  He might be able to suggest first steps for figuring out
>>>>>>>>>> what's
>>>>>>>>>> wrong here.
>>>>>>>>>>   Hope this helps,
>>>>>>>>>>   Joel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li<[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> An two other questions:
>>>>>>>>>>>
>>>>>>>>>>> 1. What should I do to change the O3 parameters such as
>>>>>>>>>>> issueWidth,
>>>>>>>>>>> commitWidth, etc? I added a few lines in se.py as below. It runs
>>>>>>>>>>> fine if I
>>>>>>>>>>> just run the benchmarks, but if I resume a checkpoint (created
>>>>>>>>>>> without -d
>>>>>>>>>>> option), then it will complain the CPU class has no such
>>>>>>>>>>> parameters. I think
>>>>>>>>>>> these parameters can only be set after M5 performs CPU mode
>>>>>>>>>>> switch, then how
>>>>>>>>>>> can I set these parameters so that M5 will use them after
>>>>>>>>>>> switching CPU
>>>>>>>>>>> mode?
>>>>>>>>>>>
>>>>>>>>>>>  if options.detailed:
>>>>>>>>>>>     CPUClass.commitWidth    = 4
>>>>>>>>>>>     CPUClass.decodeWidth    = 4
>>>>>>>>>>>     CPUClass.dispatchWidth  = 4
>>>>>>>>>>>     CPUClass.fetchWidth     = 4
>>>>>>>>>>>     CPUClass.issueWidth     = 4
>>>>>>>>>>>     CPUClass.commitWidth    = 4
>>>>>>>>>>>     CPUClass.renameWidth    = 4
>>>>>>>>>>>     CPUClass.squashWidth    = 4
>>>>>>>>>>>     CPUClass.wbWidth        = 4
>>>>>>>>>>>     CPUClass.numROBEntries  = 128
>>>>>>>>>>>     CPUClass.numIQEntries   = 36
>>>>>>>>>>>     CPUClass.LQEntries      = 48
>>>>>>>>>>>
>>>>>>>>>>> 2. When I resume a checkpoint with -d --caches options, I got
>>>>>>>>>>> RuntimeError: Attempt to instantiate orphan node. I am trying to
>>>>>>>>>>> figure out
>>>>>>>>>>> what the orphan node is. What should I do to find the orphan
>>>>>>>>>>> node? I tried
>>>>>>>>>>> "print self.name" in File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 822, in getCCObject, but got nothing.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> command line: ./build/ALPHA_SE/m5.opt configs/example/se.py
>>>>>>>>>>> --bench
>>>>>>>>>>> bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache
>>>>>>>>>>> 2200
>>>>>>>>>>> m5out/cpt.bzip2.2200
>>>>>>>>>>> Global frequency set at 1000000000000 ticks per second
>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>   File "<string>", line 1, in ?
>>>>>>>>>>>   File
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py",
>>>>>>>>>>> line 359, in
>>>>>>>>>>> main
>>>>>>>>>>>     exec filecode in scope
>>>>>>>>>>>   File "configs/example/se.py", line 179, in ?
>>>>>>>>>>>     Simulation.run(options, root, system, FutureClass)
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py",
>>>>>>>>>>> line 236, in run
>>>>>>>>>>>     m5.instantiate(checkpoint_dir)
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py",
>>>>>>>>>>>  line
>>>>>>>>>>> 77, in instantiate
>>>>>>>>>>>     for obj in root.descendants(): obj.createCCObject()
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 841, in createCCObject
>>>>>>>>>>>     def createCCObject(self):
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 796, in getCCParams
>>>>>>>>>>>     value = value.getValue()
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 845, in getValue
>>>>>>>>>>>     def getValue(self):
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 826, in getCCObject
>>>>>>>>>>>     self._ccObject = -1
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 796, in getCCParams
>>>>>>>>>>>     value = value.getValue()
>>>>>>>>>>>   File
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py",
>>>>>>>>>>> line 183,
>>>>>>>>>>> in getValue
>>>>>>>>>>>     return [ v.getValue() for v in self ]
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 845, in getValue
>>>>>>>>>>>     def getValue(self):
>>>>>>>>>>>   File
>>>>>>>>>>>
>>>>>>>>>>> "/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", 
>>>>>>>>>>> line
>>>>>>>>>>> 822, in getCCObject
>>>>>>>>>>>     #print self.name
>>>>>>>>>>> RuntimeError: Attempt to instantiate orphan node
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>> -Sheng
>>>>
>>>> _______________________________________________
>>>> m5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>
>>>
>>> --
>>> Give our ability to our work, but our genius to our life!
>>>
>>> _______________________________________________
>>> m5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
>>
>
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to