As an addendum, the calculation may be too big for a single node. How much memory does the node have, what is the RKMAX, the smallest RMT & unit cell size? Maybe use in your machines file
1:z1-2:16 z1-13:16 lapw0: z1-2:16 z1-13:16 granularity:1 extrafine:1 Check the size using x law1 -c -p -nmat_only cat *.nmat ___________________________ Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu MURI4D.numis.northwestern.edu Co-Editor, Acta Cryst A "Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Gyorgi On Apr 28, 2015 10:45 PM, "Laurence Marks" <l-ma...@northwestern.edu> wrote: > Unfortunately it is hard to know what is going on. A google search on > "Error while reading PMI socket." indicates that the message you have means > it did not work, and is not specific. Some suggestions: > > a) Try mpiexec (slightly different arguments). You just edit > parallel_options. > https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager > b) Try an older version of mvapich2 if it is on the system. > c) Do you have to launch mpdboot for your system > https://wiki.calculquebec.ca/w/MVAPICH2/en? > d) Talk to a sys_admin, particularly the one who setup mvapich > e) Do "cat *.error", maybe something else went wrong or it is not mpi's > fault but a user error. > > ___________________________ > Professor Laurence Marks > Department of Materials Science and Engineering > Northwestern University > www.numis.northwestern.edu > MURI4D.numis.northwestern.edu > Co-Editor, Acta Cryst A > "Research is to see what everybody else has seen, and to think what nobody > else has thought" > Albert Szent-Gyorgi > On Apr 28, 2015 10:17 PM, "lung Fermin" <ferminl...@gmail.com> wrote: > >> Thanks for Prof. Marks' comment. >> >> 1. In the previous email, I have missed to copy the line >> >> setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile >> _HOSTS_ _EXEC_" >> It was in the parallel_option. Sorry about that. >> >> 2. I have checked that the running program was lapw1c_mpi. Besides, when >> the mpi calculation was done on a single node for some other system, the >> results are consistent with the literature. So I believe that the mpi code >> has been setup and compiled properly. >> >> Would there be something wrong with my option in siteconfig..? Do I have >> to set some command to bind the job? Any other possible cause of the error? >> >> Any suggestions or comments would be appreciated. Thanks. >> >> >> Regards, >> >> Fermin >> >> >> ---------------------------------------------------------------------------------------------------- >> >> You appear to be missing the line >> >> setenv WIEN_MPIRUN=... >> >> This is setup when you run siteconfig, and provides the information on >> how mpi is run on your system. >> >> N.B., did you setup and compile the mpi code? >> >> ___________________________ >> Professor Laurence Marks >> Department of Materials Science and Engineering >> Northwestern University >> www.numis.northwestern.edu >> MURI4D.numis.northwestern.edu >> Co-Editor, Acta Cryst A >> "Research is to see what everybody else has seen, and to think what >> nobody else has thought" >> Albert Szent-Gyorgi >> >> On Apr 28, 2015 4:22 AM, "lung Fermin" <ferminl...@gmail.com> wrote: >> >> Dear Wien2k community, >> >> >> >> I am trying to perform calculation on a system of ~100 in-equivalent >> atoms using mpi+k point parallelization on a cluster. Everything goes fine >> when the program was run on a single node. However, if I perform the >> calculation across different nodes, the follow error occurs. How to solve >> this problem? I am a newbie to mpi programming, any help would be >> appreciated. Thanks. >> >> >> >> The error message (MVAPICH2 2.0a): >> >> >> --------------------------------------------------------------------------------------------------- >> >> Warning: no access to tty (Bad file descriptor). >> >> Thus no job control in this shell. >> >> z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 >> z1-2 z1-2 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1 >> >> -13 z1-13 z1-13 z1-13 z1-13 z1-13 >> >> number of processors: 32 >> >> LAPW0 END >> >> [z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-13 >> aborted: Error while reading a PMI socket (4) >> >> [z1-13:mpispawn_0][child_handler] MPI process (rank: 11, pid: 8546) >> terminated with signal 9 -> abort job >> >> [z1-13:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 8. >> MPI process died? >> >> [z1-13:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI >> process died? >> >> [z1-2:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 12. >> MPI process died? >> >> [z1-2:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI >> process died? >> >> [z1-2:mpispawn_0][child_handler] MPI process (rank: 0, pid: 35454) >> terminated with signal 9 -> abort job >> >> [z1-2:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node z1-2 >> aborted: MPI process error (1) >> >> [cli_15]: aborting job: >> >> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 15 >> >> >> >> > stop error >> >> >> ------------------------------------------------------------------------------------------------------ >> >> >> >> The .machines file: >> >> # >> >> 1:z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 z1-2 >> z1-2 z1-2 >> >> 1:z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 z1-13 >> z1-13 z1-13 z1-13 z1-13 >> >> granularity:1 >> >> extrafine:1 >> >> >> -------------------------------------------------------------------------------------------------------- >> >> The parallel_options: >> >> >> >> setenv TASKSET "no" >> >> setenv USE_REMOTE 0 >> >> setenv MPI_REMOTE 1 >> >> setenv WIEN_GRANULARITY 1 >> >> >> >> >> -------------------------------------------------------------------------------------------------------- >> >> >> >> Thanks. >> >> >> >> Regards, >> >> Fermin >> >
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html