Again you need to look at log.* to find out why the simulation gets killed.
Don't only look at log.switch. If one of the gem5 processes aborts then the
entire dist-gem5 simulation will be killed.

On Wed, Dec 6, 2017 at 1:50 PM, Vitorio Cargnini (lcargnini) <
lcargn...@micron.com> wrote:

> Hi Mohammad,
>
>
>
> Thank you for the prompt response. I checked the log.switch the first
> erros and I fixed was the path, the script needs full-paths to work, so, I
> fixed that, once I tried again, it executed and failed a little later.
>
>
>
> Got the following output:
>
> launch switch gem5 process on node0 ...
>
> waiting for switch to start ..
>
> node #switch started
>
> START Wed Dec  6 12:36:04 MST 2017
>
> starting gem5 on node0...
>
> starting gem5 on node0...
>
> starting gem5 on node1...
>
> starting gem5 on node1...
>
> starting gem5 on node2 ...
>
> starting gem5 on node2 ...
>
> starting gem5 on node3 ...
>
> starting gem5 on node3 ...
>
> (I) (some) gem5 process(es) exited
>
> KILLED Wed Dec  6 12:37:35 MST 2017
>
> ABORT Wed Dec  6 12:37:35 MST 2017
>
>
>
> The log.switch had the following:
>
> command line: /wada/wada/gem5/build/ARM/gem5.opt -d
> /wada/wada/gem5/m5out.switch --debug-flags=DistEthernet
> /wada/wada/gem5/configs/dist/sw.py 
> --checkpoint-dir=/wada/wada/gem5/m5out.switch
> --is-switch --dist-size=8 --dist-server-port=2200
>
>
>
> info: Standard input is not a terminal, disabling listeners.
>
> Global frequency set at 1000000000000 ticks per second
>
>       0: system.portlink0: DistEtherLink::DistEtherLink() link
> delay:10000000 ticksPerByte:800
>
>       0: global: DistIface() ctor rank:0
>
> info: tcp_iface listening on port 2200
>
> Killed by signal 15.
>
>
>
> *From:* gem5-users [mailto:gem5-users-boun...@gem5.org] *On Behalf Of 
> *Mohammad
> Alian
> *Sent:* Tuesday, December 5, 2017 9:18 PM
> *To:* gem5 users mailing list <gem5-users@gem5.org>
> *Subject:* [EXT] Re: [gem5-users] Running Dist-gem5
>
>
>
> Hi Vitorio,
>
>
>
> You should check the content of log.switch and why gem5 node simulating
> switch cannot start. There can be so many reasons that a gem5 process fails
> to run. If you print the content of switch.log here then I can help.
>
>
>
> Regarding "distributed run", you first need to setup passwordless ssh
> between your simulation (physical) hosts and then use "LSB_MCPU_HOSTS" env
> variable to assign gem5 processes to physical hosts. E.g. if your simulated
> cluster size is 8 and you want to run 4 gem5 processes on host_name0 and 4
> on host_name1, then your LSB_MCPU_HOSTS looks like this:
>
>
>
> export LSB_MCPU_HOSTS="host_name0 4 host_name1 4"
>
>
>
>
>
> Best,
>
> Mohammad
>
>
>
>
>
> On Tue, Dec 5, 2017 at 6:03 PM, Vitorio Cargnini (lcargnini) <
> lcargn...@micron.com> wrote:
>
> Hello,
>
>
>
> Please, what exactly do I need to run dist-gem5 with the –-dist?
>
>
>
> I’m trying, however it fails with “Failed ot start switch”
>
>
>
> Also, what do I need in place for it start distributed acroos nodes,
> instead of launching multiple/parallel runs in the ‘localhost’.
>
>
>
> Regards,
>
> Vitorio.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to