Thank you, with the patch I can confirm that the assertion problem has been fixed (after recreating the checkpoint).
My problems with the O3CPU persist, and was wondering if this is a problem specific to X86 or is it a general problem? -- Marco On 28/08/12 21:28, Nilay Vaish wrote: > The cause of the assert failure was tracked down recently by Jason > Power. The patch is on the review board. Here is the link - > http://reviews.gem5.org/r/1365 > > It will be committed to the mainline soon. > > -- > Nilay > > > On Tue, 28 Aug 2012, Marco Elver wrote: > >> Hi all, >> >> I would like to ask if what I am trying to do is even possible (and if >> so, how??), as I have been running into a few problems, despite >> following the advice I could find in older mailing-list threads or the >> wiki. My goal would be to run a full-system with ruby (with >> MOESI_CMP_directory), multiple processors of type O3CPU and the X86 ISA; >> I create a snapshot after the Linux kernel loaded and before the >> benchmark enters the ROI. >> >> With revision 9174:2171e04a2ee5 (Mon Aug 27 20:53:20 2012 -0400) from >> the dev repository, I tried the following: >> (1) Take a checkpoint with ruby_fs, the *MOESI_hammer* protocol >> (only one supporting checkpoints, according to Wiki) and the >> TimingSimpleCPU (succeeds): >> $> build/X86_MOESI_hammer/gem5.opt >> --outdir=m5out/rawdata/fluidanimate/ckpt configs/example/ruby_fs.py -n >> 16 --cpu-type=timing --kernel=system/x86_64-vmlinux-2.6.28.smp >> --checkpoint-dir=m5out/checkpoints/fluidanimate --max-checkpoints=1 >> --script=contrib/initscripts/parsec/fluidanimate.sh >> >> (2) Resume from the checkpoint with the O3CPU, restore with >> TimingSimpleCPU (fails): >> $> build/X86_MOESI_hammer/gem5.opt >> --outdir=m5out/rawdata/fluidanimate/detailed configs/example/ruby_fs.py >> -n 16 --cpu-type=detailed --kernel=system/x86_64-vmlinux-2.6.28.smp >> --checkpoint-dir=m5out/checkpoints/fluidanimate -r 0 >> --restore-with-cpu=timing >> [...] >> Switch at curTick count:10000 >> info: Entering event queue @ 0. Starting simulation... >> Runtime Error at MOESI_hammer-dir.sm:1270, Ruby Time: >> 1111185: assert failure, PID: 2742 >> press return to continue. >> >> Program aborted at cycle 555592500 >> >> (3) Resume from the checkpoint with the TimingSimpleCPU fails in the >> same way as (2), as in (2) the CPU isn't even switched to the O3CPU >> before it fails. >> >> (4) Though if I try taking a snapshot right after starting the >> simulator (after ~ 10000000000 cycles, kernel still booting) and then >> try to restore with the TimingSimpleCPU, it works as expected; only the >> O3CPU fails with a segfault and the following backtrace: >> #0 0x0000000000cdff56 in MasterPort::sendTimingReq >> (this=<optimized out>, pkt=0x6f8a060) >> at build/X86/mem/port.cc:136 >> #1 0x00000000005fbac5 in sendTiming (pkt=0x6f8a060, >> sendingState=0x61a7cc0, this=0x49a9e60) >> at build/X86/arch/x86/pagetable_walker.cc:173 >> #2 X86ISA::Walker::WalkerState::sendPackets (this=0x61a7cc0) >> at build/X86/arch/x86/pagetable_walker.cc:631 >> #3 0x00000000005fc8c2 in >> X86ISA::Walker::WalkerState::recvPacket (this=this@entry=0x61a7cc0, >> pkt=pkt@entry=0x1e99920) at >> build/X86/arch/x86/pagetable_walker.cc:590 >> #4 0x00000000005fcb98 in X86ISA::Walker::recvTimingResp >> (this=0x43706c0, pkt=0x1e99920) >> at build/X86/arch/x86/pagetable_walker.cc:129 >> #5 0x0000000000ce1f5b in PacketQueue::trySendTiming >> (this=0x42ba5e0) >> at build/X86/mem/packet_queue.cc:152 >> #6 0x0000000000ce2929 in PacketQueue::sendDeferredPacket >> (this=0x42ba5e0) >> at build/X86/mem/packet_queue.cc:190 >> #7 0x0000000000c391be in EventQueue::serviceOne >> (this=<optimized out>) at build/X86/sim/eventq.cc:204 >> #8 0x0000000000c7d342 in simulate >> (num_cycles=9223372036854785807) at build/X86/sim/simulate.cc:71 >> #9 0x0000000000b8e17c in _wrap_simulate__SWIG_0 >> (args=<optimized out>) >> at build/X86/python/swig/event_wrap.cc:4755 >> #10 _wrap_simulate (self=<optimized out>, args=<optimized out>) >> at build/X86/python/swig/event_wrap.cc:4804 >> #11 0x00007fb32a094fc6 in PyEval_EvalFrameEx () from >> /lib/libpython2.7.so.1.0 >> >> Trying to restore with ruby using MOESI_CMP_directory and the >> TimingSimpleCPU results in the same error as (2), with the difference >> that it finishes loading the checkpoint, resumes, but then fails after >> about a minute ("Runtime Error at MOESI_CMP_directory-dir.sm:485, Ruby >> Time: 12038425921: assert failure, PID: 19169"). Using the O3CPU still >> results in the same error as (4). >> >> In addition, I have seen workflows of: 1) create checkpoint without ruby >> and with the AtomicSimpleCPU 2) load checkpoint with ruby and the >> TimingSimpleCPU. I tried this, and it works if I set >> --restore-with-cpu=timing. But trying this with the O3CPU doesn't work, >> resulting in the same backtrace as (4). >> >> Is what I'm trying to do possible? If so, any workarounds I should >> know of? >> >> Thanks, >> Marco >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
