Is there any chance this patch also fixes the problems with checkpointing in ARM_SE? :)

I still haven't been able to fix the issues with checkpointing in atomic and restoring to either atomic or timing, so I'm just poking my head out here to see if anyone else has tackled the problem.

-Griffin Wright

On 4/10/2011 8:05 PM, Steve Reinhardt wrote:
Hi Leonard,

I do have a patch for this issue, but haven't gotten around to pushing it yet:
http://reviews.m5sim.org/r/608/

If you have the chance to download it from reviewboard and verify that
it solves your problem that would be helpful.

Steve


On Thu, Apr 7, 2011 at 4:53 PM, Sage<[email protected]>  wrote:
Hi, Steve and Rick,

You two discussed the problem of resuming a checkpoint when --l2cache and
--detailed are both specified. But I noticed that the problem occurs when
--l2cache and (--timing or --detailed) are specified but it won't happen if
only the "--l2cache" option is there. Have you figured out a way of solving
the problem?

Thanks,
Leonard



On Wed, Mar 2, 2011 at 9:32 AM, Steve Reinhardt<[email protected]>  wrote:
FYI, I finally got around to reproducing this, and I think I see what the
problem is.  Unfortunately I don't see a really trivial fix, but I've got
some ideas I'll work on to see if I can take care of it.

Steve

On Fri, Feb 18, 2011 at 5:21 AM, Steve Reinhardt<[email protected]>  wrote:
BTW, thanks for the detailed example... I've been traveling, but I'll see
if I can reproduce this when I get home.

Steve

On Thu, Feb 17, 2011 at 11:21 AM, Richard Strong<[email protected]>
wrote:
Here is the process I went through on a fresh checkout of m5 this
morning.

(1) hg clone http://repo.m5sim.org/m5

(2) cd m5

(3) scons build/ALPHA_SE/m5.opt

(4) build/ALPHA_SE/m5.opt  configs/example/se.py  --take-checkpoint=1
--at-instruction

(5) build/ALPHA_SE/m5.opt  configs/example/se.py  --checkpoint-restore=1
--at-instruction  -d --caches --l2cache
M5 Simulator System

Copyright (c) 2001-2008
The Regents of The University of Michigan
All Rights Reserved


M5 compiled Feb 17 2011 09:41:58
M5 revision 96bde0910197+ 8031+ default tip
M5 started Feb 17 2011 09:54:32
M5 executing on rstrong-desktop
command line: build/ALPHA_SE/m5.opt configs/example/se.py
--checkpoint-restore=1 --at-instruction -d --caches --l2cache
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
Switch at curTick count:10000
info: Entering event queue @ 1000.  Starting simulation...
panic: Tried to access unmapped address 0x12008b488.
  @ cycle 2500
[invoke:build/ALPHA_SE/arch/alpha/faults.cc, line 208]
Memory Usage: 586300 KBytes
For more information see: http://www.m5sim.org/panic/5932f339
Program aborted at cycle 2500
Aborted

The problem seen in the output of (5) above is caused by the workload
being adopted by switch_cpus as its parent as opposed to system.cpu. My
original fix was to modify simulate.py  to adopt orphans in sorted order,
but this appears to create orphans for fuPool as shown in the snippet of
config.ini below. This makes me think that something is broken in the design
as it depends on the order in which objects come up if certain objects
become orphans or if checkpoint files work. Is there any way to explicitly
set the parent, child relationship if you want to avoid this non
determinism.

config.ini selected output:
[system.switch_cpus.fuPool]
type=FUPool
FUList=(orphan) (orphan) (orphan) (orphan) (orphan) (orphan) (orphan)
(orphan) (orphan)





On Thu, Feb 17, 2011 at 5:45 AM, Steve Reinhardt<[email protected]>
wrote:
Hi Rick,

I'm a little confused by your statement "there is no recursion to add
the children of params".  Being param value and being a child are separate
things, because an object A can be a param of many other objects but can
only be the child of one other object.  The only relationship between the
two is that if A is set as a param value for a param of B and A does not
have a parent, then A will also implicitly be set as a child of B.  (See
towards the end of SimObject.__setattr__().)

So every SimObject param value *should* be the child of *some*
SimObject, so iterating over param values shouldn't be necessary.  The whole
point of adoptOrphanParams() is to make sure this is true; it's the one
place we iterate over all the param values, just to make sure that they all
have parents (and to set them if they don't).

Also, the adoptOrphanParams() method traverses the whole tree (see
simulate.py) using the descendants() call which is a pre-order traversal, so
any new children that are added at a particular node should be traversed
automatically.

Your configuration should not be affected by whether you're restoring
from a checkpoint or not... the config gets built first, then if there's a
checkpoint it gets restored.

I rewrote all this code last summer to clean it up, so I'm very
interested in figuring out where the bugs are.

Steve

On Wed, Feb 16, 2011 at 9:48 PM, Richard Strong<[email protected]>
wrote:
I took a close look at this problem because the same thing happens to
me. It only occurs when I use the O3CPU model when resuming from a
checkpoint. What I find is that config.ini has orphan for the FUList
parameter of the O3CPU model. Further, none of the function units are
adopted by fuPool. I think the problem lies in SimObject.py::add_child(self,
name, child) and SimObject.py::
adoptOrphanParams(self). I think that there is no recursion to add the
children of params. I tried a simple change at the end of add_child, that I
adoptOrphanParams() of the child (change showed below). This allows the
setup code to get further but now I die with:

"AttributeError: 'AnyProxy' object has no attribute 'getValue'. I was
wondering if someone knows what is going wrong? Did a recent change forget
to go down enough recursive levels when adopting children nodes?

Best,
-Rick

def add_child(self, name, child):
         print "\t in add_child name=%s child=%s"%(name, child)
         child = coerceSimObjectOrVector(child)
         if child.get_parent():
             raise RuntimeError, \
                   "add_child('%s'): child '%s' already has parent
'%s'" % \
                   (name, child._name, child._parent)
         if self._children.has_key(name):
             # This code path had an undiscovered bug that would make
it fail
             # at runtime. It had been here for a long time and was
only
             # exposed by a buggy script. Changes here will probably
not be
             # exercised without specialized testing.
             self.clear_child(name)
         child.set_parent(self, name)
         self._children[name] = child
         if isSimObjectVector(child):
             for obj in child:
                 obj.adoptOrphanParams()
         elif isSimObjectOrVector(child):
             child.adoptOrphanParams()
On Fri, Feb 11, 2011 at 11:05 PM, Joel Hestness
<[email protected]>  wrote:
Hi Sheng,
   I've dug back through some of my simulations, and I haven't been
able to find a case where I used 4GB of simulated memory, so I don't know if
I have a baseline to show that the checkpoint restore works with that much
memory.  On the other hand, I have simulated with 512MB and 1GB of simulated
memory, and it has worked fine.  For full-system simulations, we often mount
a swap disk in the simulated system in order to avoid the small virtual
memory constraints imposed by the operating system.  I'd have to defer to
others on the list for knowledge about whether that would work with SE mode.
   I can attempt to address your other questions as well:
    1) The way that you described the O3 parameters is how I have set
them in the past, so that should work.
    2) I've seen this problem before... It has had to do with the way
that certain SimObjects are instantiated as children of other SimObjects at
the beginning of the simulation, and with checkpoint restore, this isn't the
cleanest process.  When I ran into this problem, I was working on getting
x86 timing mode working with Ruby, and Brad Beckmann was able to help me
debug.  He might be able to suggest first steps for figuring out what's
wrong here.
   Hope this helps,
   Joel


On Wed, Feb 9, 2011 at 3:14 PM, Sheng Li<[email protected]>
wrote:
An two other questions:

1. What should I do to change the O3 parameters such as issueWidth,
commitWidth, etc? I added a few lines in se.py as below. It runs fine if I
just run the benchmarks, but if I resume a checkpoint (created without -d
option), then it will complain the CPU class has no such parameters. I think
these parameters can only be set after M5 performs CPU mode switch, then how
can I set these parameters so that M5 will use them after switching CPU
mode?

  if options.detailed:
     CPUClass.commitWidth    = 4
     CPUClass.decodeWidth    = 4
     CPUClass.dispatchWidth  = 4
     CPUClass.fetchWidth     = 4
     CPUClass.issueWidth     = 4
     CPUClass.commitWidth    = 4
     CPUClass.renameWidth    = 4
     CPUClass.squashWidth    = 4
     CPUClass.wbWidth        = 4
     CPUClass.numROBEntries  = 128
     CPUClass.numIQEntries   = 36
     CPUClass.LQEntries      = 48

2. When I resume a checkpoint with -d --caches options, I got
RuntimeError: Attempt to instantiate orphan node. I am trying to figure out
what the orphan node is. What should I do to find the orphan node? I tried
"print self.name" in File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
822, in getCCObject, but got nothing.


command line: ./build/ALPHA_SE/m5.opt configs/example/se.py --bench
bzip2 --checkpoint-restore=0 --simpoint -d --caches --l2cache
2200
m5out/cpt.bzip2.2200
Global frequency set at 1000000000000 ticks per second
Traceback (most recent call last):
   File "<string>", line 1, in ?
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/main.py", line 359, in
main
     exec filecode in scope
   File "configs/example/se.py", line 179, in ?
     Simulation.run(options, root, system, FutureClass)
   File
"/afs/crc.nd.edu/user/s/sli2/m5-work-stable/configs/common/Simulation.py",
line 236, in run
     m5.instantiate(checkpoint_dir)
   File
"/afs/crc.nd.edu/user/s/sli2/m5-work-stable/src/python/m5/simulate.py", line
77, in instantiate
     for obj in root.descendants(): obj.createCCObject()
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
841, in createCCObject
     def createCCObject(self):
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
796, in getCCParams
     value = value.getValue()
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
845, in getValue
     def getValue(self):
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
826, in getCCObject
     self._ccObject = -1
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
796, in getCCParams
     value = value.getValue()
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/params.py", line 183,
in getValue
     return [ v.getValue() for v in self ]
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
845, in getValue
     def getValue(self):
   File
"/afs/crc.nd.edu/user/s/sli2/m5-stable/src/python/m5/SimObject.py", line
822, in getCCObject
     #print self.name
RuntimeError: Attempt to instantiate orphan node

Thanks a lot!
-Sheng

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


--
Give our ability to our work, but our genius to our life!

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to