I have checked that case.vsp/vns are up-to-date. I guess lawp0_mpi runs properly.
I compiled the source codes with ifort and please find the following for the linking options: current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -Dmkl_scalapack -traceback current:FFTW_OPT:-DFFTW3 -I/usr/local/include current:FFTW_LIBS:-lfftw3_mpi -lfftw3 -L/usr/local/lib current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_intelmpi_lp64 $(R_LIBS) current:MPIRUN:/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_ current:MKL_TARGET_ARCH:intel64 Is it ok to use -lmkl_blacs_intelmpi_lp64? Thanks a lot for all the suggestions. Regards, Fermin -----Original Message----- From: wien-boun...@zeus.theochem.tuwien.ac.at [mailto: wien-boun...@zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha To: A Mailing list for WIEN2k users Subject: Re: [Wien] Error in mpi+k point parallelization across multiple nodes It seems as if lapw0_mpi runs properly ?? Please check if you have NEW (check date with ls -als)!! valid case.vsp/vns files, which can be used in eg. a sequential lapw1 step. This suggests that mpi and fftw are ok. The problems seem to start in lapw1_mpi, and this program requires in addition to mpi also scalapack. I guess you compile with ifort and link with the mkl ?? There is one crucial blacs library, which must be adapted to your mpi, since they are specific to a particular mpi (intelmpi, openmpi, ...): Which blacks-library do you link ? -lmkl_blacs_lp64 or another one ?? Check out the doku for the mkl. Am 04.05.2015 um 05:18 schrieb lung Fermin: > I have tried to set MPI_REMOTE=0 and used 32 cores (on 2 nodes) for > distributing the mpi job. However, the problem still persist... but the error message looks different this time: > > $> cat *.error > Error in LAPW2 > ** testerror: Error in Parallel LAPW2 > > and the output on screen: > Warning: no access to tty (Bad file descriptor). > Thus no job control in this shell. > z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 > z1-17 z1-17 z1-17 z1-17 z1-17 z1-18 z1-18 z1-18 z1-18 z1-18 z1-18 > z1-18 z1-18 z1-18 z1-18 z1-18 z1-18 > z1-18 z1-1 > 8 z1-18 z1-18 > number of processors: 32 > LAPW0 END > [16] Failed to dealloc pd (Device or resource busy) [0] Failed to > dealloc pd (Device or resource busy) [17] Failed to dealloc pd (Device > or resource busy) [2] Failed to dealloc pd (Device or resource busy) > [18] Failed to dealloc pd (Device or resource busy) [1] Failed to > dealloc pd (Device or resource busy) > LAPW1 END > LAPW2 - FERMI; weighs written > [z1-17:mpispawn_0][child_handler] MPI process (rank: 0, pid: 28291) > terminated with signal 9 -> abort job [z1-17:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 9. MPI process died? > [z1-17:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died? > [z1-17:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node > z1-17 aborted: Error while reading a PMI socket (4) [z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 21. MPI process died? > [z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 21. MPI process died? > [z1-18:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI process died? > cp: cannot stat `.in.tmp': No such file or directory > > > stop error > > > ---------------------------------------------------------------------- > -------------------------------------- >
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html