Hi all,
Any one knows how to solve this error ?
SLURM_JOB_NODELIST=base-node,compute-node
Sending 400 rows to task 15 offset=5600
[cli_0]: write_line error; fd=6 buf=:cmd=get kvsname=kvs_9638_0
key=P15-businesscard
:
system msg for write_line failure : Bad file descriptor
[cli_0]: write_line error; fd=6 buf=:cmd=get kvsname=kvs_9638_0
key=P15-businesscard
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(193)....................: MPI_Send(buf=0x7fff125ab780, count=1,
MPI_INT, dest=15, tag=1, MPI_COMM_WORLD) failed
MPIDI_EagerContigSend(255).......: failure occurred while attempting to
send an eager message
MPID_nem_tcp_iStartContigMsg(298):
MPID_nem_tcp_connect(846)........:
getConnInfoKVS(678)..............: PMI_KVS_Get failed
I tried to restart MPI Matrix Multiplication with size 6000 x 6000.
I use Ubuntu 14.04
I use MVAPICH2-2.2b. I submit job using SLURM resource manager.
the following is my two commands I used to submit a job :
mpirun dmtcp_launch -h $h -p $p ./mm.o 6000 (submit)
./dmtcp_restart_script.sh (restart)
Thank you in advance
Regards,
Husen
--
Post Graduate Student
Faculty of Computer Science
University of Indonesia
Depok
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum