Here is what I have determined so far:

I first create a checkpoint for ammp using (Note, ammp instantiation is within myse.py): r --trace-flags="MMU" configs/example/myse.py --max_checkpoints=1 --take_checkpoints="3000000,1000" --checkpoint_dir=./checkpoints. See the "AMMP checkpoint MMU trace:" trace below

I then try to run this checkpoint under atomic cpu with: "build/ALPHA_SE/m5.debug --trace-flags=MMU configs/example/se.py -r 1 --checkpoint_dir=./checkpoints". The python is able to execute m5.restoreCheckpoint (root,joinpath(cptdir, "cpt.%s" % cpts[cpt_num - 1])) without a problem. It is in src/sim/main.cc "PyRun_SimpleString("m5.main.main()");" that the system fails. At the point in sim/process.cc:argsInit(int intSize, int pageSize), "pTable->allocate(stack_min, roundUp(stack_size, pageSize));" causes an exception. This exception is "fatal: PageTable::allocate: address 0x11ff92000 already mapped @ cycle 3000000". See "AMMP Already Mapped error:" for the output of M5.

At this point, one solution was the comment out the pTable->allocate(...) in sim/process.cc:394. This code does reach the event queue, but quickly dies with "warn: Entering event queue @ 3000000. Starting simulation...panic: Page table fault when accessing virtual address 0xffffffffffff8610"

I then attempted going closer to the source and change src/mem/page_table.cc:allocate(), and replace fatal with warn on line 110, and have it return after giving a warning (see code 1 below). This results in an exception the same as above: "panic: Page table fault when accessing virtual address 0xffffffffffff8610@ cycle 3035000 " a little further into the binary (5000 ticks). This has the output given under heading "AMMP remove fault and return" below.

   code 1:
   if (iter != pTable.end()) {
       // already mapped
       warn ("PageTable::allocate: address 0x%x already mapped", vaddr);
       return;//FIXME: Was fatal
   }

My final attempt was to remove the return of code 1 and just let the code allocate two page table entries for the same page. This gets a little further, but ends with "panic: Page table fault when accessing virtual address 0xffffffffffff8d88 @ cycle 3026000" (6000 ticks).

This brings up some questions. Steve guessed that the machine is trying to set itself up (which I think happens in LiveProcess::argsInit()), and the checkpoint is trying to overwrite it (which I think occurs in python Simulation.py "m5.restoreCheckpoint(root,joinpath(cptdir, "cpt.%s" % cpts[cpt_num - 1]))"). If these are indeed the functions doing this, then the opposite appears to be happening. restoreCheckpoint(...) is called first in Simulations.py and then the argsInit is called in C++, which finds that the checkpoint code has already added entries and caused the fault. When I then replace the fault with a warn and return, the code reaches execution but is unable to translate some of the entries. This seems to suggest either that the checkpoint didn't record all of the entires, the restore did not load all of the entries, or I am jumping to the wrong place in the binary (a location whose page table entries are not present) but that the binary is trying to do a store. Any ideas of how to proceed with this interesting problem.

-Richard


"AMMP checkpoint MMU trace:"

Copyright (c) 2001-2006
The Regents of The University of Michigan
All Rights Reserved


M5 compiled Oct 24 2007 12:16:01
M5 started Wed Oct 24 12:24:07 2007
M5 executing on rickshin-2.local
command line: /Library/WebServer/Documents/research/m5/m5-2.0b3/build/ALPHA_SE/m5.debug --trace-flags=MMU configs/example/myse.py --max_checkpoints=1 --take_checkpoints=3000000,1000 --checkpoint_dir=./checkpoints
Global frequency set at 1000000000000 ticks per second
     0: system.membus: port list has 1 entries
     0: system.membus: port list has 1 entries
     0: system.membus: port list has 1 entries
     0: global: Allocating Page: 0x120000000-0x120002000
     0: global: Allocating Page: 0x120002000-0x120004000
     0: global: Allocating Page: 0x120004000-0x120006000
     0: global: Allocating Page: 0x120006000-0x120008000
     0: global: Allocating Page: 0x120008000-0x12000a000
     0: global: Allocating Page: 0x12000a000-0x12000c000
     0: global: Allocating Page: 0x12000c000-0x12000e000
     0: global: Allocating Page: 0x12000e000-0x120010000
     0: global: Allocating Page: 0x120010000-0x120012000
     0: global: Allocating Page: 0x120012000-0x120014000
     0: global: Allocating Page: 0x120014000-0x120016000
     0: global: Allocating Page: 0x120016000-0x120018000
     0: global: Allocating Page: 0x120018000-0x12001a000
     0: global: Allocating Page: 0x12001a000-0x12001c000
     0: global: Allocating Page: 0x12001c000-0x12001e000
     0: global: Allocating Page: 0x12001e000-0x120020000
     0: global: Allocating Page: 0x120020000-0x120022000
     0: global: Allocating Page: 0x120022000-0x120024000
     0: global: Allocating Page: 0x120024000-0x120026000
     0: global: Allocating Page: 0x120026000-0x120028000
     0: global: Allocating Page: 0x120028000-0x12002a000
     0: global: Allocating Page: 0x12002a000-0x12002c000
     0: global: Allocating Page: 0x12002c000-0x12002e000
     0: global: Allocating Page: 0x12002e000-0x120030000
     0: global: Allocating Page: 0x120030000-0x120032000
     0: global: Allocating Page: 0x120032000-0x120034000
     0: global: Allocating Page: 0x120034000-0x120036000
     0: global: Allocating Page: 0x120036000-0x120038000
     0: global: Allocating Page: 0x120038000-0x12003a000
     0: global: Allocating Page: 0x12003a000-0x12003c000
     0: global: Allocating Page: 0x12003c000-0x12003e000
     0: global: Allocating Page: 0x12003e000-0x120040000
     0: global: Allocating Page: 0x120040000-0x120042000
     0: global: Allocating Page: 0x120042000-0x120044000
     0: global: Allocating Page: 0x120044000-0x120046000
     0: global: Allocating Page: 0x120046000-0x120048000
     0: global: Allocating Page: 0x120048000-0x12004a000
     0: global: Allocating Page: 0x12004a000-0x12004c000
     0: global: Allocating Page: 0x12004c000-0x12004e000
     0: global: Allocating Page: 0x12004e000-0x120050000
     0: global: Allocating Page: 0x120050000-0x120052000
     0: global: Allocating Page: 0x120052000-0x120054000
     0: global: Allocating Page: 0x120054000-0x120056000
     0: global: Allocating Page: 0x120056000-0x120058000
     0: global: Allocating Page: 0x120058000-0x12005a000
     0: global: Allocating Page: 0x12005a000-0x12005c000
     0: global: Allocating Page: 0x12005c000-0x12005e000
     0: global: Allocating Page: 0x140000000-0x140002000
     0: global: Allocating Page: 0x140002000-0x140004000
     0: global: Allocating Page: 0x140004000-0x140006000
     0: global: Allocating Page: 0x140006000-0x140008000
     0: global: Allocating Page: 0x140008000-0x14000a000
     0: global: Allocating Page: 0x14000a000-0x14000c000
     0: global: Allocating Page: 0x14000c000-0x14000e000
     0: global: Allocating Page: 0x14000e000-0x140010000
     0: global: Allocating Page: 0x140010000-0x140012000
     0: global: Allocating Page: 0x140012000-0x140014000
     0: global: Allocating Page: 0x11ff92000-0x11ff9c000
warn: Entering event queue @ 0.  Starting simulation...
   500: global: Allocating Page: 0x11ff90000-0x11ff92000
warn: Increasing stack size by one page.
291500: global: Allocating Page: 0x140014000-0x140016000
291500: global: Allocating Page: 0x140016000-0x140018000
291500: global: Allocating Page: 0x140018000-0x14001a000
291500: global: Allocating Page: 0x14001a000-0x14001c000
291500: global: Allocating Page: 0x14001c000-0x14001e000
291500: global: Allocating Page: 0x14001e000-0x140020000
291500: global: Allocating Page: 0x140020000-0x140022000
Writing checkpoint
Exiting @ cycle 3000000 because maximum 1 checkpoints dropped
Simulation done.

Program exited normally.

"AMMP Already Mapped error:"

Copyright (c) 2001-2006
The Regents of The University of Michigan
All Rights Reserved


M5 compiled Oct 24 2007 12:16:01
M5 started Wed Oct 24 12:39:05 2007
M5 executing on rickshin-2.local
command line: /Library/WebServer/Documents/research/m5/m5-2.0b3/build/ALPHA_SE/m5.debug --trace-flags=MMU configs/example/se.py -r 1 --checkpoint_dir=./checkpoints
Global frequency set at 1000000000000 ticks per second
     0: system.membus: port list has 1 entries
     0: system.membus: port list has 1 entries
     0: system.membus: port list has 1 entries
Restoring checkpoint ...
Restoring from checkpoint
Done.
Simulation starting: no checkpoints
3000000: global: Allocating Page: 0x12005e000-0x120060000
3000000: global: Allocating Page: 0x120060000-0x120062000
3000000: global: Allocating Page: 0x120062000-0x120064000
3000000: global: Allocating Page: 0x120064000-0x120066000
3000000: global: Allocating Page: 0x120066000-0x120068000
3000000: global: Allocating Page: 0x120068000-0x12006a000
3000000: global: Allocating Page: 0x12006a000-0x12006c000
3000000: global: Allocating Page: 0x12006c000-0x12006e000
3000000: global: Allocating Page: 0x12006e000-0x120070000
3000000: global: Allocating Page: 0x120070000-0x120072000
3000000: global: Allocating Page: 0x120072000-0x120074000
3000000: global: Allocating Page: 0x120084000-0x120086000
3000000: global: Allocating Page: 0x120086000-0x120088000
3000000: global: Allocating Page: 0x120088000-0x12008a000
3000000: global: Allocating Page: 0x12008a000-0x12008c000
3000000: global: Allocating Page: 0x12008c000-0x12008e000
3000000: global: Allocating Page: 0x11ff92000-0x11ff9c000
fatal: PageTable::allocate: address 0x11ff92000 already mapped
@ cycle 3000000
[allocate:build/ALPHA_SE/mem/page_table.cc, line 110]
Memory Usage: 0 KBytes

Program exited with code 01.

"AMMP remove fault and return":

M5 compiled Oct 24 2007 13:46:15
M5 started Wed Oct 24 13:46:50 2007
M5 executing on rickshin-2.local
command line: /Library/WebServer/Documents/research/m5/m5-2.0b3/build/ALPHA_SE/m5.debug -d ../results --trace-flags=MMU configs/example/se.py -r 1 --checkpoint_dir=./checkpoints
Global frequency set at 1000000000000 ticks per second
     0: system.membus: port list has 1 entries
     0: system.membus: port list has 1 entries
     0: system.membus: port list has 1 entries
Restoring checkpoint ...
Restoring from checkpoint
Done.
Simulation starting: no checkpoints
3000000: global: Allocating Page: 0x12005e000-0x120060000
3000000: global: Allocating Page: 0x120060000-0x120062000
3000000: global: Allocating Page: 0x120062000-0x120064000
3000000: global: Allocating Page: 0x120064000-0x120066000
3000000: global: Allocating Page: 0x120066000-0x120068000
3000000: global: Allocating Page: 0x120068000-0x12006a000
3000000: global: Allocating Page: 0x12006a000-0x12006c000
3000000: global: Allocating Page: 0x12006c000-0x12006e000
3000000: global: Allocating Page: 0x12006e000-0x120070000
3000000: global: Allocating Page: 0x120070000-0x120072000
3000000: global: Allocating Page: 0x120072000-0x120074000
3000000: global: Allocating Page: 0x120084000-0x120086000
3000000: global: Allocating Page: 0x120086000-0x120088000
3000000: global: Allocating Page: 0x120088000-0x12008a000
3000000: global: Allocating Page: 0x12008a000-0x12008c000
3000000: global: Allocating Page: 0x12008c000-0x12008e000
3000000: global: Allocating Page: 0x11ff92000-0x11ff9c000
warn: PageTable::allocate: address 0x11ff92000 already mapped
warn: Entering event queue @ 3000000.  Starting simulation...
panic: Page table fault when accessing virtual address 0xffffffffffff8610
@ cycle 3035000
[invoke:build/ALPHA_SE/sim/faults.cc, line 65]



_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to