Re: [Wien] Error in mpi+k point parallelization across multiple nodes

lung Fermin Sun, 03 May 2015 23:04:13 -0700

I have checked that case.vsp/vns are up-to-date. I guess lawp0_mpi runs
properly.


I compiled the source codes with ifort and please find the following for
the linking options:

current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback

current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
-Dmkl_scalapack -traceback

current:FFTW_OPT:-DFFTW3 -I/usr/local/include

current:FFTW_LIBS:-lfftw3_mpi -lfftw3 -L/usr/local/lib

current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t
-pthread

current:DPARALLEL:'-DParallel'

current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
-openmp -lpthread -lguide

current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64
-lmkl_blacs_intelmpi_lp64 $(R_LIBS)

current:MPIRUN:/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile
_HOSTS_ _EXEC_

current:MKL_TARGET_ARCH:intel64


Is it ok to use -lmkl_blacs_intelmpi_lp64?


Thanks a lot for all the suggestions.

Regards,

Fermin


-----Original Message-----
From: wien-boun...@zeus.theochem.tuwien.ac.at [mailto:
wien-boun...@zeus.theochem.tuwien.ac.at] On Behalf Of Peter Blaha
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] Error in mpi+k point parallelization across multiple
nodes



It seems as if lapw0_mpi runs properly ?? Please check if you have NEW
(check date with ls -als)!! valid case.vsp/vns files, which can be used in
eg. a sequential lapw1 step.



This suggests that   mpi and fftw are ok.



The problems seem to start in lapw1_mpi, and this program requires in
addition to mpi also scalapack.



I guess you compile with ifort and link with the mkl ??

There is one crucial blacs library, which must be adapted to your mpi,
since they are specific to a particular mpi (intelmpi, openmpi, ...):

Which blacks-library do you link ?   -lmkl_blacs_lp64   or another one ??

Check out the doku for the mkl.





Am 04.05.2015 um 05:18 schrieb lung Fermin:

> I have tried to set MPI_REMOTE=0 and used 32 cores (on 2 nodes) for

> distributing the mpi job. However, the problem still persist... but the
error message looks different this time:

>

> $> cat *.error

> Error in LAPW2

> **  testerror: Error in Parallel LAPW2

>

> and the output on screen:

> Warning: no access to tty (Bad file descriptor).

> Thus no job control in this shell.

> z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17 z1-17

> z1-17 z1-17 z1-17 z1-17 z1-17 z1-18 z1-18 z1-18 z1-18 z1-18 z1-18

> z1-18 z1-18 z1-18 z1-18 z1-18 z1-18

> z1-18 z1-1

> 8 z1-18 z1-18

> number of processors: 32

>   LAPW0 END

> [16] Failed to dealloc pd (Device or resource busy) [0] Failed to

> dealloc pd (Device or resource busy) [17] Failed to dealloc pd (Device

> or resource busy) [2] Failed to dealloc pd (Device or resource busy)

> [18] Failed to dealloc pd (Device or resource busy) [1] Failed to

> dealloc pd (Device or resource busy)

>   LAPW1 END

> LAPW2 - FERMI; weighs written

> [z1-17:mpispawn_0][child_handler] MPI process (rank: 0, pid: 28291)

> terminated with signal 9 -> abort job [z1-17:mpispawn_0][readline]
Unexpected End-Of-File on file descriptor 9. MPI process died?

> [z1-17:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?

> [z1-17:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node

> z1-17 aborted: Error while reading a PMI socket (4)
[z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 21.
MPI process died?

> [z1-18:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor
21. MPI process died?

> [z1-18:mpispawn_1][handle_mt_peer] Error while reading PMI socket. MPI
process died?

> cp: cannot stat `.in.tmp': No such file or directory

>

>  >   stop error

>

>

> ----------------------------------------------------------------------

> --------------------------------------

>

_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Error in mpi+k point parallelization across multiple nodes

Reply via email to