Re: [gem5-users] Checkpointing possible with Ruby, X86, TimingSimpleCPU and O3CPU?

Marco Elver Wed, 29 Aug 2012 15:11:31 -0700

Thank you, with the patch I can confirm that the assertion problem has
been fixed (after recreating the checkpoint).


My problems with the O3CPU persist, and was wondering if this is a
problem specific to X86 or is it a general problem?

-- Marco

On 28/08/12 21:28, Nilay Vaish wrote:
> The cause of the assert failure was tracked down recently by Jason
> Power. The patch is on the review board. Here is the link -
> http://reviews.gem5.org/r/1365
>
> It will be committed to the mainline soon.
>
> -- 
> Nilay
>
>
> On Tue, 28 Aug 2012, Marco Elver wrote:
>
>> Hi all,
>>
>> I would like to ask if what I am trying to do is even possible (and if
>> so, how??), as I have been running into a few problems, despite
>> following the advice I could find in older mailing-list threads or the
>> wiki. My goal would be to run a full-system with ruby (with
>> MOESI_CMP_directory), multiple processors of type O3CPU and the X86 ISA;
>> I create a snapshot after the Linux kernel loaded and before the
>> benchmark enters the ROI.
>>
>> With revision 9174:2171e04a2ee5 (Mon Aug 27 20:53:20 2012 -0400) from
>> the dev repository, I tried the following:
>>    (1) Take a checkpoint with ruby_fs, the *MOESI_hammer* protocol
>> (only one supporting checkpoints, according to Wiki) and the
>> TimingSimpleCPU (succeeds):
>>           $> build/X86_MOESI_hammer/gem5.opt
>> --outdir=m5out/rawdata/fluidanimate/ckpt configs/example/ruby_fs.py -n
>> 16 --cpu-type=timing --kernel=system/x86_64-vmlinux-2.6.28.smp
>> --checkpoint-dir=m5out/checkpoints/fluidanimate --max-checkpoints=1
>> --script=contrib/initscripts/parsec/fluidanimate.sh
>>
>>    (2) Resume from the checkpoint with the O3CPU, restore with
>> TimingSimpleCPU (fails):
>>           $> build/X86_MOESI_hammer/gem5.opt
>> --outdir=m5out/rawdata/fluidanimate/detailed configs/example/ruby_fs.py
>> -n 16 --cpu-type=detailed --kernel=system/x86_64-vmlinux-2.6.28.smp
>> --checkpoint-dir=m5out/checkpoints/fluidanimate -r 0
>> --restore-with-cpu=timing
>>           [...]
>>           Switch at curTick count:10000
>>           info: Entering event queue @ 0.  Starting simulation...
>>           Runtime Error at MOESI_hammer-dir.sm:1270, Ruby Time:
>> 1111185: assert failure, PID: 2742
>>           press return to continue.
>>
>>           Program aborted at cycle 555592500
>>
>>    (3) Resume from the checkpoint with the TimingSimpleCPU fails in the
>> same way as (2), as in (2) the CPU isn't even switched to the O3CPU
>> before it fails.
>>
>>    (4) Though if I try taking a snapshot right after starting the
>> simulator (after ~ 10000000000 cycles, kernel still booting) and then
>> try to restore with the TimingSimpleCPU, it works as expected; only the
>> O3CPU fails with a segfault and the following backtrace:
>>        #0  0x0000000000cdff56 in MasterPort::sendTimingReq
>> (this=<optimized out>, pkt=0x6f8a060)
>>            at build/X86/mem/port.cc:136
>>        #1  0x00000000005fbac5 in sendTiming (pkt=0x6f8a060,
>> sendingState=0x61a7cc0, this=0x49a9e60)
>>            at build/X86/arch/x86/pagetable_walker.cc:173
>>        #2  X86ISA::Walker::WalkerState::sendPackets (this=0x61a7cc0)
>>            at build/X86/arch/x86/pagetable_walker.cc:631
>>        #3  0x00000000005fc8c2 in
>> X86ISA::Walker::WalkerState::recvPacket (this=this@entry=0x61a7cc0,
>>            pkt=pkt@entry=0x1e99920) at
>> build/X86/arch/x86/pagetable_walker.cc:590
>>        #4  0x00000000005fcb98 in X86ISA::Walker::recvTimingResp
>> (this=0x43706c0, pkt=0x1e99920)
>>            at build/X86/arch/x86/pagetable_walker.cc:129
>>        #5  0x0000000000ce1f5b in PacketQueue::trySendTiming
>> (this=0x42ba5e0)
>>            at build/X86/mem/packet_queue.cc:152
>>        #6  0x0000000000ce2929 in PacketQueue::sendDeferredPacket
>> (this=0x42ba5e0)
>>            at build/X86/mem/packet_queue.cc:190
>>        #7  0x0000000000c391be in EventQueue::serviceOne
>> (this=<optimized out>) at build/X86/sim/eventq.cc:204
>>        #8  0x0000000000c7d342 in simulate
>> (num_cycles=9223372036854785807) at build/X86/sim/simulate.cc:71
>>        #9  0x0000000000b8e17c in _wrap_simulate__SWIG_0
>> (args=<optimized out>)
>>            at build/X86/python/swig/event_wrap.cc:4755
>>        #10 _wrap_simulate (self=<optimized out>, args=<optimized out>)
>>            at build/X86/python/swig/event_wrap.cc:4804
>>        #11 0x00007fb32a094fc6 in PyEval_EvalFrameEx () from
>> /lib/libpython2.7.so.1.0
>>
>> Trying to restore with ruby using MOESI_CMP_directory and the
>> TimingSimpleCPU results in the same error as (2), with the difference
>> that it finishes loading the checkpoint, resumes, but then fails after
>> about a minute ("Runtime Error at MOESI_CMP_directory-dir.sm:485, Ruby
>> Time: 12038425921: assert failure, PID: 19169"). Using the O3CPU still
>> results in the same error as (4).
>>
>> In addition, I have seen workflows of: 1) create checkpoint without ruby
>> and with the AtomicSimpleCPU 2) load checkpoint with ruby and the
>> TimingSimpleCPU. I tried this, and it works if I set
>> --restore-with-cpu=timing. But trying this with the O3CPU doesn't work,
>> resulting in the same backtrace as (4).
>>
>> Is what I'm trying to do possible? If so, any workarounds I should
>> know of?
>>
>> Thanks,
>> Marco
>>
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] Checkpointing possible with Ruby, X86, TimingSimpleCPU and O3CPU?

Reply via email to