Hi, Mohammad

I solved the problem. I modified the file *boot.easy.ckpt.sh
<http://boot.easy.ckpt.sh>* as the table 1.a), and then ping to the other
node before taking a checkpoint and create a checkpoint successfully. I am
wodering if there is a better way to create a checkpoint in a heterogeneous
cluster by dist-gem5.

Table 1. a) modified part; b) orignal part

48 if [ "$MY_RANK" == "0" ]
49 then
50    sleep 2
51    ping -c 1 192.168.0.3
52    sleep 1
53 else
54    ping -c 1 192.168.0.2
55    /sbin/m5 checkpoint 1:
56 fi

48 if [ "$MY_RANK" == "0" ]

49 then

50     /sbin/m5 checkpoint 1

51 else

52     sleep 0.01

53 fi





Best Regards,
Boyang Xu

A graduate student in UVIC

On Thu, Mar 29, 2018 at 7:14 PM, Mohammad Alian <[email protected]>
wrote:

> Can you post the rcS script that you use for taking checkpoint? Can you
> ping the other node before taking checkpoint?
>
> On Thu, Mar 29, 2018 at 6:41 PM, Boyang Xu <[email protected]> wrote:
>
>> Hi, Mohammad
>> The exact problem is to fail to run apache bench in the above
>> configuration. The attachments are the output files and input files.
>> BTY, is it possible to create a checkpoint with Android disk image by
>> dist-gem5? is there the special requirement of Android disk image`s version?
>> Looking forward to your reply.
>>
>> Best Regards,
>> Boyang Xu
>>
>> A graduate student in UVIC
>>
>> On Thu, Mar 29, 2018 at 2:10 PM, Mohammad Alian <[email protected]>
>> wrote:
>>
>>> I see that both nodes write a checkpoint. What is the problem exactly?
>>>
>>> Best,
>>> Mohammad
>>>
>>>
>>> On Wed, Mar 28, 2018 at 9:22 PM, Boyang Xu <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Although I followed the tutorial “iiswc17-tutorial-final-dist-gem5” to
>>>> model a  heterogeneous cluster, I failed because of the failure to create a
>>>> checkpoint. There are two modes in the heterogeneous cluster. The node 0
>>>> has two CPUs while the node 1 has one CPU. I think the reason is that the
>>>> whole dist-gem5 process is over after the node 0 finishes to create a
>>>> checkpoint while the node 1 does not finish the initialization and creating
>>>> a checkpoint, because the executing speed of node 0 gem5 process with two
>>>> CPUs is faster than node 1 gem5 process with one CPUs. The node 1 does not
>>>> finish the checkpoint actually.
>>>>
>>>> Due to solve it, I added a command “sleep 5” before or after the
>>>> command “/sbin/m5 checkpoint 1” in the file boot.easy.ckpt.rcS but failed.
>>>> The attachments is my scripts and output files.
>>>>
>>>> Looking forward to your reply.
>>>> Best Regards,
>>>> Boyang Xu
>>>>
>>>> A graduate student in UVIC
>>>>
>>>> _______________________________________________
>>>> gem5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to