Just a user account would be enough. It looks like a bug in the ssh plugin, or the network part. That's the only guess I can make so far.
Also, to get more debug info, you can compile dmtcp with the "--enable-debug" option set. This will give more verbose info about both the coordinator and the computing processes. Best, Jiajun On Mon, Oct 19, 2015 at 1:56 PM, Manuel Rodríguez Pascual < manuel.rodriguez.pasc...@gmail.com> wrote: > I will ask about the guest account, but I am not very optimistic about it. > > What do you need exactly? Just a user account to the system, no root. > Correct? > > (feel free to continue this conversation outside dmtcp-forum mailing list > if you consider it irrelevant for the community) > > 2015-10-19 10:52 GMT-07:00 Jiajun Cao <jia...@ccs.neu.edu>: > >> Also, if possible, could you offer us a guest account of your cluster? >> Compared to email communication, this is more efficient to debug. >> >> >> Best, >> Jiajun >> >> On Mon, Oct 19, 2015 at 11:26 AM, Jiajun Cao <jia...@ccs.neu.edu> wrote: >> >>> Hi Manuel, >>> >>> The infiniband plugin shouldn't affect application launching. Could you >>> try removing the "--ib" flag and see if the application still crashes? This >>> can help diagnose whether the issue is in the ib plugin or other dmtcp >>> modules. >>> >>> Best, >>> Jiajun >>> >>> >>> Best, >>> Jiajun >>> >>> On Sun, Oct 18, 2015 at 10:57 PM, Kapil Arya <ka...@ccs.neu.edu> wrote: >>> >>>> Hey Jiajun, >>>> >>>> Can you take a look at this problem as it is closer to your area of >>>> expertise :-). >>>> >>>> Best, >>>> Kapil >>>> >>>> On Sat, Oct 17, 2015 at 11:31 PM, Manuel Rodríguez Pascual < >>>> manuel.rodriguez.pasc...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am trying to checkpoint an MVAPICH application. It does not behave >>>>> as expected, so maybe you can give me some support. >>>>> >>>>> I have compiled DMTCP with "--enable-infiniband-support " as only >>>>> flag. I have MVAPICH installed. >>>>> >>>>> I can execute a test MPI application in two nodes, without DMTCP. I >>>>> also can execute the application in a single node with DMTCP. however, it >>>>> I >>>>> execute it in two nodes with DMTCP, only the first one will run. >>>>> >>>>> Below there is a series of test commands with a lot of output, >>>>> together with the versions of everything. >>>>> >>>>> Any ideas? >>>>> >>>>> thanks for your help, >>>>> >>>>> >>>>> Manuel >>>>> >>>>> >>>>> --- >>>>> --- >>>>> >>>>> # mpichversion >>>>> >>>>> MVAPICH2 Version: 2.2a >>>>> >>>>> MVAPICH2 Release date: Mon Aug 17 20:00:00 EDT 2015 >>>>> >>>>> MVAPICH2 Device: ch3:mrail >>>>> >>>>> MVAPICH2 configure: --disable-mcast >>>>> >>>>> MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2 >>>>> >>>>> MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND -O2 >>>>> >>>>> MVAPICH2 F77: gfortran -L/lib -L/lib -O2 >>>>> >>>>> MVAPICH2 FC: gfortran -O2 >>>>> >>>>> # dmtcp_coordinator --version >>>>> >>>>> dmtcp_coordinator (DMTCP) 2.4.1 >>>>> >>>>> --- >>>>> >>>>> --- >>>>> >>>>> >>>>> I can execute a test MPI application in two nodes (acme11 and 12), >>>>> with >>>>> >>>>> --- >>>>> --- >>>>> # mpirun_rsh -n 2 acme11 acme12 ./helloWorldMPI >>>>> >>>>> Process 0 of 2 is on acme11.ciemat.es >>>>> >>>>> Process 1 of 2 is on acme12.ciemat.es >>>>> >>>>> Hello world from process 0 of 2 >>>>> >>>>> Hello world from process 1 of 2 >>>>> >>>>> Goodbye world from process 0 of 2 >>>>> >>>>> Goodbye world from process 1 of 2 >>>>> --- >>>>> --- >>>>> >>>>> As you can see, it works correctly. >>>>> >>>>> >>>>> If I try to execute the application with DMTCP, however, it does not. >>>>> >>>>> I run the coordinator on acme11, with port 7779. >>>>> >>>>> >>>>> I can execute the application on a single node. For example, >>>>> >>>>> --- >>>>> --- >>>>> >>>>> # dmtcp_launch -h acme11 -p 7779 --ib mpirun_rsh -n 1 acme12 >>>>> ./helloWorldMPI >>>>> >>>>> [41000] NOTE at ssh.cpp:369 in prepareForExec; REASON='New ssh command' >>>>> >>>>> newCommand = /home/localsoft/dmtcp/bin/dmtcp_ssh >>>>> /home/localsoft/dmtcp/bin/dmtcp_nocheckpoint /usr/bin/ssh -q acme12 cd >>>>> /home/slurm/tests;/home/localsoft/dmtcp/bin/dmtcp_launch --coord-host >>>>> 172.17.29.173 --coord-port 7779 --ckptdir /home/slurm/tests --infiniband >>>>> /home/localsoft/dmtcp/bin/dmtcp_sshd /usr/bin/env MPISPAWN_MPIRUN_MPD=0 >>>>> USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=acme11.ciemat.es >>>>> MPISPAWN_MPIRUN_HOSTIP=172.17.29.173 MPIRUN_RSH_LAUNCH=1 >>>>> MPISPAWN_CHECKIN_PORT=33687 MPISPAWN_MPIRUN_PORT=33687 MPISPAWN_NNODES=1 >>>>> MPISPAWN_GLOBAL_NPROCS=1 MPISPAWN_MPIRUN_ID=40000 MPISPAWN_ARGC=1 >>>>> MPDMAN_KVS_TEMPLATE=kvs_885_acme11.ciemat.es_40000 MPISPAWN_LOCAL_NPROCS=1 >>>>> MPISPAWN_ARGV_0='./helloWorldMPI' MPISPAWN_ARGC=1 >>>>> MPISPAWN_GENERIC_ENV_COUNT=0 MPISPAWN_ID=0 >>>>> MPISPAWN_WORKING_DIR=/home/slurm/tests MPISPAWN_MPIRUN_RANK_0=0 >>>>> /usr/local/bin/mpispawn 0 >>>>> >>>>> Process 0 of 1 is on acme12.ciemat.es >>>>> >>>>> Hello world from process 0 of 1 >>>>> >>>>> Goodbye world from process 0 of 1 >>>>> >>>>> >>>>> COORDINATOR OUTPUT >>>>> >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-4029-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = mpirun_rsh >>>>> >>>>> msg.from = 1d64b124afe30f29-52000-562310a2 >>>>> >>>>> client->identity() = 1d64b124afe30f29-4029-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-52000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = mpirun_rsh_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-53000-562310a2 >>>>> >>>>> client->identity() = 1d64b124afe30f29-52000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-53000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = dmtcp_ssh_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-54000-562310a2 >>>>> >>>>> client->identity() = 1d64b124afe30f29-53000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-54000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = dmtcp_ssh >>>>> >>>>> msg.from = 1d64b124afe30f29-53000-562310a2 >>>>> >>>>> client->identity() = 1d64b124afe30f29-53000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1b69d09fb3238b30-23945-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = dmtcp_sshd >>>>> >>>>> msg.from = 1b69d09fb3238b30-55000-562310a2 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-23945-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1b69d09fb3238b30-55000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme12.ciemat.es >>>>> >>>>> client->progname() = dmtcp_sshd_(forked) >>>>> >>>>> msg.from = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-55000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme12.ciemat.es >>>>> >>>>> client->progname() = mpispawn_(forked) >>>>> >>>>> msg.from = 1b69d09fb3238b30-57000-562310a2 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = env >>>>> >>>>> msg.from = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = mpispawn >>>>> >>>>> msg.from = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = helloWorldMPI >>>>> >>>>> msg.from = 1b69d09fb3238b30-57000-562310a2 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-57000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1b69d09fb3238b30-57000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1b69d09fb3238b30-56000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1b69d09fb3238b30-55000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-53000-562310a2 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-52000-562310a2 >>>>> >>>>> >>>>> --- >>>>> --- >>>>> >>>>> So we see that it is working correctly, connecting and so. >>>>> >>>>> However, if I run the application in more than one core, as in the >>>>> first example, it crashes. What happens is that the first node on the node >>>>> list executes the application, and the rest do not. >>>>> >>>>> ---- >>>>> ---- >>>>> >>>>> [root@acme11 tests]# dmtcp_launch -h acme11 -p 7779 --ib mpirun_rsh >>>>> -n 2 acme11 acme12 ./helloWorldMPI >>>>> >>>>> [59000] NOTE at ssh.cpp:369 in prepareForExec; REASON='New ssh command' >>>>> >>>>> newCommand = /home/localsoft/dmtcp/bin/dmtcp_ssh >>>>> /home/localsoft/dmtcp/bin/dmtcp_nocheckpoint /usr/bin/ssh -q acme11 cd >>>>> /home/slurm/tests;/home/localsoft/dmtcp/bin/dmtcp_launch --coord-host >>>>> 172.17.29.173 --coord-port 7779 --ckptdir /home/slurm/tests --infiniband >>>>> /home/localsoft/dmtcp/bin/dmtcp_sshd /usr/bin/env MPISPAWN_MPIRUN_MPD=0 >>>>> USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=acme11.ciemat.es >>>>> MPISPAWN_MPIRUN_HOSTIP=172.17.29.173 MPIRUN_RSH_LAUNCH=1 >>>>> MPISPAWN_CHECKIN_PORT=34203 MPISPAWN_MPIRUN_PORT=34203 MPISPAWN_NNODES=2 >>>>> MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=58000 MPISPAWN_ARGC=1 >>>>> MPDMAN_KVS_TEMPLATE=kvs_481_acme11.ciemat.es_58000 MPISPAWN_LOCAL_NPROCS=1 >>>>> MPISPAWN_ARGV_0='./helloWorldMPI' MPISPAWN_ARGC=1 >>>>> MPISPAWN_GENERIC_ENV_COUNT=0 MPISPAWN_ID=0 >>>>> MPISPAWN_WORKING_DIR=/home/slurm/tests MPISPAWN_MPIRUN_RANK_0=0 >>>>> /usr/local/bin/mpispawn 0 >>>>> >>>>> [60000] NOTE at ssh.cpp:369 in prepareForExec; REASON='New ssh command' >>>>> >>>>> newCommand = /home/localsoft/dmtcp/bin/dmtcp_ssh >>>>> /home/localsoft/dmtcp/bin/dmtcp_nocheckpoint /usr/bin/ssh -q acme12 cd >>>>> /home/slurm/tests;/home/localsoft/dmtcp/bin/dmtcp_launch --coord-host >>>>> 172.17.29.173 --coord-port 7779 --ckptdir /home/slurm/tests --infiniband >>>>> /home/localsoft/dmtcp/bin/dmtcp_sshd /usr/bin/env MPISPAWN_MPIRUN_MPD=0 >>>>> USE_LINEAR_SSH=1 MPISPAWN_MPIRUN_HOST=acme11.ciemat.es >>>>> MPISPAWN_MPIRUN_HOSTIP=172.17.29.173 MPIRUN_RSH_LAUNCH=1 >>>>> MPISPAWN_CHECKIN_PORT=34203 MPISPAWN_MPIRUN_PORT=34203 MPISPAWN_NNODES=2 >>>>> MPISPAWN_GLOBAL_NPROCS=2 MPISPAWN_MPIRUN_ID=58000 MPISPAWN_ARGC=1 >>>>> MPDMAN_KVS_TEMPLATE=kvs_481_acme11.ciemat.es_58000 MPISPAWN_LOCAL_NPROCS=1 >>>>> MPISPAWN_ARGV_0='./helloWorldMPI' MPISPAWN_ARGC=1 >>>>> MPISPAWN_GENERIC_ENV_COUNT=0 MPISPAWN_ID=1 >>>>> MPISPAWN_WORKING_DIR=/home/slurm/tests MPISPAWN_MPIRUN_RANK_0=1 >>>>> /usr/local/bin/mpispawn 0 >>>>> >>>>> Process 0 of 2 is on acme11.ciemat.es >>>>> >>>>> Hello world from process 0 of 2 >>>>> >>>>> Goodbye world from process 0 of 2 >>>>> >>>>> COORDINATOR OUTPUT >>>>> >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-4070-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = mpirun_rsh >>>>> >>>>> msg.from = 1d64b124afe30f29-58000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-4070-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-58000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-58000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = mpirun_rsh_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-59000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-58000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = mpirun_rsh_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-60000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-58000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-59000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-60000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = dmtcp_ssh_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-61000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-59000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = dmtcp_ssh_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-62000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-60000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-61000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-62000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = dmtcp_ssh >>>>> >>>>> msg.from = 1d64b124afe30f29-59000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-59000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = dmtcp_ssh >>>>> >>>>> msg.from = 1d64b124afe30f29-60000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-60000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1b69d09fb3238b30-24001-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-4094-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = dmtcp_sshd >>>>> >>>>> msg.from = 1d64b124afe30f29-64000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-4094-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = dmtcp_sshd >>>>> >>>>> msg.from = 1b69d09fb3238b30-63000-56231173 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-24001-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-64000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1b69d09fb3238b30-63000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = dmtcp_sshd_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-64000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme12.ciemat.es >>>>> >>>>> client->progname() = dmtcp_sshd_(forked) >>>>> >>>>> msg.from = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-63000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = env >>>>> >>>>> msg.from = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = mpispawn >>>>> >>>>> msg.from = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker >>>>> connected' >>>>> >>>>> hello_remote.from = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme11.ciemat.es >>>>> >>>>> client->progname() = mpispawn_(forked) >>>>> >>>>> msg.from = 1d64b124afe30f29-68000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:858 in onData; REASON='Updating >>>>> process Information after fork()' >>>>> >>>>> client->hostname() = acme12.ciemat.es >>>>> >>>>> client->progname() = mpispawn_(forked) >>>>> >>>>> msg.from = 1b69d09fb3238b30-67000-56231173 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = env >>>>> >>>>> msg.from = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = mpispawn >>>>> >>>>> msg.from = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = helloWorldMPI >>>>> >>>>> msg.from = 1d64b124afe30f29-68000-56231173 >>>>> >>>>> client->identity() = 1d64b124afe30f29-68000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating >>>>> process Information after exec()' >>>>> >>>>> progname = helloWorldMPI >>>>> >>>>> msg.from = 1b69d09fb3238b30-67000-56231173 >>>>> >>>>> client->identity() = 1b69d09fb3238b30-67000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-68000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1b69d09fb3238b30-67000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-65000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1b69d09fb3238b30-66000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-64000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1b69d09fb3238b30-63000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-59000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-60000-56231173 >>>>> >>>>> [3984] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; >>>>> REASON='client disconnected' >>>>> >>>>> client->identity() = 1d64b124afe30f29-58000-56231173 >>>>> >>>>> >>>>> ---- >>>>> >>>>> ---- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Dr. Manuel Rodríguez-Pascual >>>>> skype: manuel.rodriguez.pascual >>>>> phone: (+34) 913466173 // (+34) 679925108 >>>>> >>>>> CIEMAT-Moncloa >>>>> Edificio 22, desp. 1.25 >>>>> Avenida Complutense, 40 >>>>> 28040- MADRID >>>>> SPAIN >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Dmtcp-forum mailing list >>>>> Dmtcp-forum@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>> >>>>> >>>> >>> >> > > > -- > Dr. Manuel Rodríguez-Pascual > skype: manuel.rodriguez.pascual > phone: (+34) 913466173 // (+34) 679925108 > > CIEMAT-Moncloa > Edificio 22, desp. 1.25 > Avenida Complutense, 40 > 28040- MADRID > SPAIN >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list Dmtcp-forum@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dmtcp-forum