I have checked with the MPIRUN option. I used
setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -hostfile $PBS_NODEFILE _EXEC_" before. Now I changed the hostfile to _HOSTS_ instead of $PBS_NODEFILE. I can get 4 lapw1_mpi running. However, the CPU usage of each of the job is still only 50% (I use "top" to check this). Why is this the case? What could I do in order to get CPU usage of 100%? (OMP_NUM_THREAD=1 and in the .machine1 and .machine2 file I have two lines of node1) In the pure MPI case, using the .machines file as # 1:node1 node1 node1 node1 granularity:1 extrafine:1 # I can get 4 lapw1_mpi running with 100% CPU usage. How shall I understand this situation? The following are some details on the options and the system I used: 1. Wien2k_14.2, mpif90 (compiled with ifort) for MVAPICH2 version 2.0 2. The batch system is PBS and the script I used for qsub: # #!/bin/tcsh #PBS -l nodes=1:ppn=4 #PBS -l walltime=00:30:00 #PBS -q node1 #PBS -o wien2k_output #PBS -j oe cd $PBS_O_WORKDIR limit vmemoryuse unlimited #set how many cores to be used for each mpi job set mpijob=2 set proclist=`cat $PBS_NODEFILE ` echo $proclist set nproc=$#proclist echo number of processors: $nproc #---------- writing .machines file ------------------ echo '#' > .machines set i=1 while ($i <= $nproc ) echo -n '1:' >>.machines @ i1 = $i + $mpijob @ i2 = $i1 - 1 echo $proclist[$i-$i2] >>.machines set i=$i1 end echo 'granularity:1' >>.machines echo 'extrafine:1' >>.machines # --------- end of .machines file run_lapw -p -i 40 -cc 0.0001 -ec 0.00001 ### 3. The .machines file: # 1:node1 node1 1:node1 node1 granularity:1 extrafine:1 and .machine1 and .machine2 files are both node1 node1 3. The parallel_options: setenv TASKSET "no" setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN "/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_" 4. The compiling options: current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -Dmkl_scalapack -traceback current:FFTW_OPT:-DFFTW3 -I/usr/local/include current:FFTW_LIBS:-lfftw3_mpi -lfftw3 -L/usr/local/lib current:LDFLAGS:$(FOPT) -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t -pthread current:DPARALLEL:'-DParallel' current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide current:RP_LIBS:-lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_intelmpi_lp64 $(R_LIBS) current:MPIRUN:/usr/local/mvapich2-icc/bin/mpirun -np _NP_ -hostfile _HOSTS_ _EXEC_ current:MKL_TARGET_ARCH:intel64 Thanks, Fermin *------------------------------------------------------------* *-------------------------------------------------------------* -----Original Message----- From: wien-boun...@zeus.theochem.tuwien.ac.at [ mailto:wien-boun...@zeus.theochem.tuwien.ac.at <wien-boun...@zeus.theochem.tuwien.ac.at>] On Behalf Of Peter Blaha Sent: Tuesday, January 27, 2015 11:55 PM To: A Mailing list for WIEN2k users Subject: Re: [Wien] Job distribution problem in MPI+k point parallelization It should actually be only 4 lapw1_mpi jobs running with this setup. How did you find this: using "top" or ps ??? Do you have thread-parallelization on ? (OMP_NUM_THREAD=2 ???) Then it doubles the processes (but you gain nothing ...) It could also be that your mpirun definition is not ok with respect of you version of mpi, ... PS: I hope it is clear, that such a setup is useful only for testing. the mpi-program on 2 cores is "slower/at least not faster" than the sequential program on 1 core. On 01/27/2015 04:41 PM, lung Fermin wrote: > Dear Wien2k community, > > Recently, I am trying to set up a calculation of a system with ~40 > atoms using MPI+k point parallelization. Suppose in one single node, I > want to calculate 2 k points, with each k point using 2 processors to > run mpi parallel. The .machines file I used was # > 1:node1 node1 > 1:node1 node1 > granularity:1 > extrafine:1 > # > > When I ssh into node1, I saw that there were 8 lapw1_mpi running, each > with CPU usage of 50%. Is this natural or have I done something wrong? > What I expect was having 4 lapw1_mpi running each with CPU usage of > 100% instead. I am a newbei to mpi parallelization. Please point me > out if I have misunderstand anything. > > Thanks in advance, > Fermin > > >
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html