Hi Ciro,

Thanks for your suggestion. I should have given more details. 

The .rcS script I use to take checkpoint is as follow. I downloaded it
from dist-gem5 official website and did not modify it. 

> #!/bin/bash
> # Authors: Mohammad Alian <mali...@illinois.edu>
> # boot gem5 and take a checkpoint
> #
> # The idea of this script is the same as
> # "configs/boot/hack_back_ckpt.rcS" by Joel Hestness
> # Please look into that for more info
> #
> source /root/.bashrc
> # Retrieve dist-gem5 rank and size parameters using the 'm5' utility
> MY_RANK=$(/sbin/m5 initparam dist-rank)
> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
> MY_SIZE=$(/sbin/m5 initparam dist-size)
> [ $? = 0 ] || { echo "m5 initparam failed"; exit -1; }
> echo "***** Start boot script! *****"
> if [ "${RUNSCRIPT_VAR+set}" != set ]
> then
> # Signal our future self that it's safe to continue
> echo "RUNSCRIPT_VAR not set! Setting it ..."
> export RUNSCRIPT_VAR=1
> else
> echo "RUNSCRIPT_VAR is set!"
> # We've already executed once, so we should exit
> echo "calling m5 exit ..."
> /sbin/m5 exit 1
> fi
> /bin/hostname node${MY_RANK}
> # Keep MAC address assignment simple for now ...
> (($MY_RANK > 97)) && { echo "(E) Rank must be less than 98"; /sbin/m5 abort; }
> ((MY_ADDR = MY_RANK + 2))
> if (($MY_ADDR < 10))
> then
> MY_ADDR_PADDED=0${MY_ADDR}
> else
> MY_ADDR_PADDED=${MY_ADDR}
> fi
> /sbin/ifconfig eth0 hw ether 00:90:00:00:00:${MY_ADDR_PADDED}
> /sbin/ifconfig eth0 192.168.0.${MY_ADDR} netmask 255.255.255.0 up
> /sbin/ifconfig -a
> # take a checkpoint
> if [ "$MY_RANK" == "0" ]
> then
> /sbin/m5 checkpoint 1
> else
> sleep 0.01
> fi
> #THIS IS WHERE EXECUTION BEGINS FROM AFTER RESTORING FROM CKPT
> if [ "$RUNSCRIPT_VAR" -eq 1 ]
> then
> # Signal our future self not to recurse infinitely
> export RUNSCRIPT_VAR=2
> # Read the script for the checkpoint restored execution
> echo "Loading new script..."
> /sbin/m5 readfile > /tmp/runscript1.sh
> # Execute the new runscript
> if [ -s /tmp/runscript1.sh ]
> then
> /bin/bash /tmp/runscript1.sh
> else
> echo "Script not specified"
> fi
> fi
> echo "Fell through script. Exiting ..."
> /sbin/m5 exit 1

When I took a checkpoint in "aarch64-ubuntu-trusty-headless.img" by
dist-gem5, It works.  The important part in output file log.0 is as
follow. 

> warn: Device specific PCI config space not implemented for 
> testsys.realview.ethernet!
> 26258473387000: global: DistIface::readyToCkpt() called, delay:1 period:0
> info: m5 checkpoint called with non-zero delay => triggering immediate 
> checkpoint (at the next sync)
> 26258480000000: global: DistIFace::drain() called
> 26258480000500: global: DistIFace::drain() called
> info: Entering event queue @ 26258480000000.  Starting simulation...
> Writing checkpoint
> 26258480000500: global: DistIFace::drainResume() called
> info: Entering event queue @ 26258480000500.  Starting simulation...
> 26267427931500: global: DistIface::readyToExit() called, delay:1
> info: m5 exit called with non-zero delay => triggering immediate exit (at the 
> next sync)
> Exiting @ tick 26267430000000 because exit request from gem5 peers

However, when I took a checkpoint in
"aarch32-ubuntu-natty-headless.img", It did not work. The same part in
output file log.0 is as follow: 

> warn:  instruction 'mcr bpiall' unimplemented
> 3362818440000: global: DistIface::readyToCkpt() called, delay:1 period:0
> info: m5 checkpoint called with non-zero delay => triggering immediate 
> checkpoint (at the next sync)
> 3385307368500: global: DistIface::readyToExit() called, delay:1
> info: m5 exit called with non-zero delay => triggering immediate exit (at the 
> next sync)
> info: recv(): Connection closed
> Exiting @ tick 3389177332000 because connection to gem5 peer got closed

You can see there is not the overstriking part of the first log.0 in the
second log.0. 

On 2018-07-01 02:13 AM, Ciro Santilli wrote:

> Just saw the attachments now. 
> 
> I would recommend in-lining them as much as possible in the email, and 
> selecting the most interesting part if they are huge. 
> 
> This will make it more likely that people will look at them, and allow search 
> engines to index them. 
> 
> On Sun, Jul 1, 2018 at 9:56 AM, Ciro Santilli <ciro.santi...@gmail.com> wrote:
> 
> How did you try to take the checkpoint? Manually or with some init script? 
> 
> How did you try to restore it, and how did it fail. 
> 
> Did the init actually script run? Add prints or set -x to it. 
> 
> On Sun, Jul 1, 2018 at 7:45 AM, Boyang Xu <6172...@gmail.com> wrote: 
> 
> Hi everyone, 
> 
> I failed to take a checkpoint with aarch32-ubuntu-natty-headless.img by 
> dist-gem5, but succeeded to do it with aarch64-ubuntu-trusty-headless.img. 
> The input and output files are attached. 
> 
> My command line is as follow: build/ARM/gem5.opt
> -d m5out.0
> --debug-flags=DistEthernet
> configs/example/fs.py
> --cpu-type=AtomicSimpleCPU --num-cpus=1 --machine-type=VExpress_EMM
> --disk-image=aarch32-ubuntu-natty-headless.img
> --kernel=vmlinux.aarch32.ll_20131205.0-gem5
> --script=boot.easy.ckpt.rcS
> --checkpoint-dir=m5out.0
> --dist --dist-rank=0 --dist-size=2 --dist-server-name=127.0.0.1 
> --dist-server-port=2200 
> 
> Any suggestion and help on taking a checkpoint with linux_32bit.img by 
> dist-gem5 is welcomed. Thanks a lot! 
> 
> Best Regards, 
> Boyang Xu 
> 
> A graduate student in UVIC _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [1]

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users 

-- 
Best Regards,
Boyang Xu

A graduate student in UVIC 

Links:
------
[1] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to