Thanks Mohammad,

I tried and got the following in the log(log.switch) (below), I also did try 
using different orders in the parameters, what should I look for now?

Log:
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 compiled Dec  6 2017 14:35:43
gem5 started Dec 11 2017 11:16:10
gem5 executing on rndarch11, pid 9841
command line: /wada/gem5/build/ARM/gem5.opt -d /wada/gem5/m5out.switch 
--debug-flags=DistEthernet /wada/gem5/configs/dist/sw.py 
--dist-sync-start=1000000000000 --checkpoint-dir=/wada/gem5/m5out.switch 
--is-switch --dist-size=8 --dist-server-port=2200

info: Standard input is not a terminal, disabling listeners.
Global frequency set at 1000000000000 ticks per second
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File /wada/gem5/src/python/m5/main.py", line 433, in main
    exec filecode in scope
  File /wada/gem5/configs/dist/sw.py", line 79, in <module>
    main()
  File /wada/gem5/configs/dist/sw.py", line 76, in main
    Simulation.run(options, root, None, None)
  File /wada/gem5/configs/common/Simulation.py", line 589, in run
    m5.instantiate(checkpoint_dir)
  File /wada/gem5/src/python/m5/simulate.py", line 115, in instantiate
    for obj in root.descendants(): obj.createCCObject()
  File /wada/gem5/src/python/m5/SimObject.py", line 1484, in createCCObject
    self.getCCParams()
  File /wada/gem5/src/python/m5/SimObject.py", line 1439, in getCCParams
    setattr(cc_params, param, value)
TypeError: (): incompatible function arguments. The following argument types 
are supported:
    1. (self: _m5.param_DistEtherLink.DistEtherLinkParams, arg0: int) -> None

Invoked with: <_m5.param_DistEtherLink.DistEtherLinkParams object at 
0x7f9a37b8fd80>, 999999999999999983222784L

From: gem5-users [mailto:[email protected]] On Behalf Of Mohammad 
Alian
Sent: Sunday, December 10, 2017 7:54 PM
To: gem5 users mailing list <[email protected]>
Subject: Re: [gem5-users] [EXT] Re: Running Dist-gem5

Oh, you should start synchronization between gem5 nodes before you start 
communication inside the simulated cluster. Use "--dist-sync-start" option to 
start synchronization before send tick (4428354726000). You should pass this 
option to all gem5 processes (FS nodes + switch node). So you should set 
--dist-sync-start as a "--cf-args" argument in your launch script:

--cf-args --dist-sync-start=1000000000000


Best,
Mohammad




On Fri, Dec 8, 2017 at 12:36 PM, Vitorio Cargnini (lcargnini) 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Mohammad I made some changes and attempted again, it worked but for some 
reason it simplies … dies after a while, not sure why.

Igot the following message on my terminal:
     0: global: DistIface::startup() done
info: Entering event queue @ 0.  Starting simulation...
panic: panic condition recv_tick <= curTick() occurred: Simulators out of sync 
- missed packet receive by 771635016399 ticks(rev_recv_tick: 0 send_tick: 
4428354726000 send_delay: 257601 linkDelay: 10000000 )
Memory Usage: 402472 KBytes
Program aborted at tick 5200000000000




On log.switch this is what I got:

**** REAL SIMULATION ****
      0: system.portlink0: DistEtherLink::startup() called
      0: global: DistIface::startup() started
info: Dist sync scheduled at 5200000000000 and repeats       0: global: 
DistIface::startup() done
10000000
      0: system.portlink1: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
      0: system.portlink2: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
      0: system.portlink3: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
      0: system.portlink4: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
      0: system.portlink5: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
      0: system.portlink6: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
      0: system.portlink7: DistEtherLink::startup() called
      0: global: DistIface::startup() started
      0: global: DistIface::startup() done
info: Entering event queue @ 0.  Starting simulation...
panic: panic condition recv_tick <= curTick() occurred: Simulators out of sync 
- missed packet receive by 771635016399 ticks(rev_recv_tick: 0 send_tick: 
4428354726000 send_delay: 257601 linkDelay: 10000000 )
Memory Usage: 402472 KBytes
Program aborted at tick 5200000000000

From: gem5-users 
[mailto:[email protected]<mailto:[email protected]>] On 
Behalf Of Mohammad Alian
Sent: Thursday, December 7, 2017 10:00 AM

To: gem5 users mailing list <[email protected]<mailto:[email protected]>>
Subject: Re: [gem5-users] [EXT] Re: Running Dist-gem5

Please look at the content of log.* not m5out.*/stats.txt . It's not surprising 
that stats.txt is empty ...

On Thu, Dec 7, 2017 at 11:55 AM, Vitorio Cargnini (lcargnini) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

The m5out.*/stats.txt from everyone are empty.

However, the m5out.switch/config.ini is filled with:
It goes from 0 to 7:
[system.portlink7]
type=DistEtherLink
delay=10000000
delay_var=0
dist_rank=0
dist_size=8
dist_sync_on_pseudo_op=false
dump=Null
eventq_index=0
is_switch=true
num_nodes=8
server_name=127.0.0.1
server_port=2200
speed=800.000000
sync_repeat=0
sync_start=5200000000000
int0=system.interface[7]

I’m thinking if the server_name could be the problem…


From: gem5-users 
[mailto:[email protected]<mailto:[email protected]>] On 
Behalf Of Mohammad Alian
Sent: Wednesday, December 6, 2017 4:28 PM
To: gem5 users mailing list <[email protected]<mailto:[email protected]>>
Subject: Re: [gem5-users] [EXT] Re: Running Dist-gem5

Again you need to look at log.* to find out why the simulation gets killed. 
Don't only look at log.switch. If one of the gem5 processes aborts then the 
entire dist-gem5 simulation will be killed.

On Wed, Dec 6, 2017 at 1:50 PM, Vitorio Cargnini (lcargnini) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Mohammad,

Thank you for the prompt response. I checked the log.switch the first erros and 
I fixed was the path, the script needs full-paths to work, so, I fixed that, 
once I tried again, it executed and failed a little later.

Got the following output:
launch switch gem5 process on node0 ...
waiting for switch to start ..
node #switch started
START Wed Dec  6 12:36:04 MST 2017
starting gem5 on node0...
starting gem5 on node0...
starting gem5 on node1...
starting gem5 on node1...
starting gem5 on node2 ...
starting gem5 on node2 ...
starting gem5 on node3 ...
starting gem5 on node3 ...
(I) (some) gem5 process(es) exited
KILLED Wed Dec  6 12:37:35 MST 2017
ABORT Wed Dec  6 12:37:35 MST 2017

The log.switch had the following:
command line: /wada/wada/gem5/build/ARM/gem5.opt -d 
/wada/wada/gem5/m5out.switch --debug-flags=DistEthernet 
/wada/wada/gem5/configs/dist/sw.py 
--checkpoint-dir=/wada/wada/gem5/m5out.switch --is-switch --dist-size=8 
--dist-server-port=2200

info: Standard input is not a terminal, disabling listeners.
Global frequency set at 1000000000000 ticks per second
      0: system.portlink0: DistEtherLink::DistEtherLink() link delay:10000000 
ticksPerByte:800
      0: global: DistIface() ctor rank:0
info: tcp_iface listening on port 2200
Killed by signal 15.

From: gem5-users 
[mailto:[email protected]<mailto:[email protected]>] On 
Behalf Of Mohammad Alian
Sent: Tuesday, December 5, 2017 9:18 PM
To: gem5 users mailing list <[email protected]<mailto:[email protected]>>
Subject: [EXT] Re: [gem5-users] Running Dist-gem5

Hi Vitorio,

You should check the content of log.switch and why gem5 node simulating switch 
cannot start. There can be so many reasons that a gem5 process fails to run. If 
you print the content of switch.log here then I can help.

Regarding "distributed run", you first need to setup passwordless ssh between 
your simulation (physical) hosts and then use "LSB_MCPU_HOSTS" env variable to 
assign gem5 processes to physical hosts. E.g. if your simulated cluster size is 
8 and you want to run 4 gem5 processes on host_name0 and 4 on host_name1, then 
your LSB_MCPU_HOSTS looks like this:

export LSB_MCPU_HOSTS="host_name0 4 host_name1 4"


Best,
Mohammad


On Tue, Dec 5, 2017 at 6:03 PM, Vitorio Cargnini (lcargnini) 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

Please, what exactly do I need to run dist-gem5 with the –-dist?

I’m trying, however it fails with “Failed ot start switch”

Also, what do I need in place for it start distributed acroos nodes, instead of 
launching multiple/parallel runs in the ‘localhost’.

Regards,
Vitorio.









_______________________________________________
gem5-users mailing list
[email protected]<mailto:[email protected]>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


_______________________________________________
gem5-users mailing list
[email protected]<mailto:[email protected]>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


_______________________________________________
gem5-users mailing list
[email protected]<mailto:[email protected]>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users


_______________________________________________
gem5-users mailing list
[email protected]<mailto:[email protected]>
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to