Dear all, I compile wien2k 11 on linux centos 5.5 with icc , ifort 11.1, openmpi mpif90, and intel mkl with the following parameter: K1 Linux (Intel ifort 11.1 compiler + mkl ) O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback L Linker Flags: $(FOPT) -L/home/yljia/intel/Compiler/11.1/072/mkl/lib/em64t -pthread P Preprocessor flags '-DParallel' R R_LIB (LAPACK+BLAS): -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_openmpi_lp64 -L/home/yljia/compiler_library/fftw-2.1.5/lib/ -lfftw_mpi -lfftw $(R_LIBS) FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback MP MPIRUN commando : mpirun -np _NP_ --hostfile _HOSTS_ _EXEC_ The program can run in non parallel mode, k point paralle. But in mpi parallel mode , it has error messages in the following two files: 1. STDOUT: LAPW0 END LAPW0 END ......... LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END -------------------------------------------------------------------------- There are no allocated resources for the application /home/yljia/software/wien2k_11/lapw1_mpi that match the requested mapping: .machine5
Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -------------------------------------------------------------------------- LAPW1 END LAPW1 END -------------------------------------------------------------------------- There are no allocated resources for the application /home/yljia/software/wien2k_11/lapw1_mpi that match the requested mapping: .machine6 ........... ........... .machine8 Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -------------------------------------------------------------------------- FERMI - Error cp: cannot stat `.in.tmp': No such file or directory rm: cannot remove `.in.tmp': No such file or directory rm: cannot remove `.in.tmp1': No such file or directory > stop error 2. TiC.dayfile: Calculating TiC in /home/yljia/wien2k/TiC/testqsub/TiC on compute-0-12.local with PID 16027 using WIEN2k_11.1 (Release 14/6/2011) in /home/yljia/software/wien2k_11 start (Sat Aug 3 00:42:07 CST 2013) with lapw0 (40/99 to go) cycle 1 (Sat Aug 3 00:42:07 CST 2013) (40/99 to go) > lapw0 -p (00:42:07) starting parallel lapw0 at Sat Aug 3 00:42:07 CST 2013 -------- .machine0 : 16 processors 5.812u 22.540s 0:04.23 670.2% 0+0k 0+0io 205pf+0w > lapw1 -p (00:42:11) starting parallel lapw1 at Sat Aug 3 00:42:12 CST 2013 -> starting parallel LAPW1 jobs at Sat Aug 3 00:42:12 CST 2013 running LAPW1 in parallel mode (using .machines) 8 number_of_parallel_jobs compute-0-12 compute-0-12(32) 3.181u 0.181s 0:02.77 121.2% 0+0k 0+0io 33pf+0w compute-0-12 compute-0-12(32) 2.781u 0.117s 0:02.58 112.0% 0+0k 0+0io 18pf+0w compute-0-12 compute-0-12(32) 2.343u 0.089s 0:02.28 106.1% 0+0k 0+0io 17pf+0w compute-0-12 compute-0-12(32) 2.818u 0.126s 0:02.52 116.2% 0+0k 0+0io 17pf+0w compute-0-2 compute-0-2(32) 0.010u 0.012s 0:00.03 66.6% 0+0k 0+0io 0pf+0w compute-0-2 compute-0-2(32) 0.009u 0.014s 0:00.03 33.3% 0+0k 0+0io 0pf+0w compute-0-2 compute-0-2(32) 0.010u 0.020s 0:00.04 75.0% 0+0k 0+0io 0pf+0w compute-0-2 compute-0-2(32) 0.012u 0.020s 0:00.04 75.0% 0+0k 0+0io 0pf+0w Summary of lapw1para: compute-0-12 k=0 user=128 wallclock=30.78 11.349u 1.617s 0:10.77 120.2% 0+0k 0+0io 85pf+0w > lapw2 -p (00:42:22) running LAPW2 in parallel mode ** LAPW2 crashed! 0.076u 0.108s 0:00.20 85.0% 0+0k 0+0io 9pf+0w error: command /home/yljia/software/wien2k_11/lapw2para lapw2.def failed > stop error The following is the shell script I submit. I have 2 nodes, and each has 8 cores[except the host node]: #!/bin/tcsh #$ -S /bin/tcsh #$ -N W2web_Job # MPIR_HOME from submitting environment #$ -v MPIR_HOME # needs in # $NSLOTS # the number of tasks to be used # $TMPDIR/machines # a valid machine file to be passed to mpirun #$ -cwd #$ -o job.out #$ -e job.err #$ -q parallel.q #$ -pe mpich 8 # mpich / jobs_per_node = number of nodes set mpijob=1 set jobs_per_node=8 setenv OMP_NUM_THREADS 1 setenv USE_REMOTE 0 echo "Got $NSLOTS slots." > job.out echo "Got $NSLOTS slots." > job.err pwd set proclist=`cat $TMPDIR/machines` set nproc=$NSLOTS echo $nproc nodes for this job: $proclist if( -e .proclist_tmp) rm .proclist_tmp if ($jobs_per_node != 8 ) then set j=1 while ($j <= $nproc ) @ j1 = $j + $jobs_per_node @ j1 = $j1 - 1 echo $proclist[$j-$j1] >>.proclist_tmp @ j = $j + 8 end set proclist=`cat .proclist_tmp` rm .proclist_tmp set nproc=$#proclist endif echo $nproc nodes for this job: $proclist echo '#' > .machines # example for an MPI parallel lapw0 echo -n 'lapw0:' >> .machines echo $proclist >>.machines #example for k-point and mpi parallel lapw1/2 #set j=1 #while ($j <= $jobs_per_node ) set i=1 while ($i <= $nproc ) echo -n '1:' >>.machines @ i1 = $i + $mpijob @ i2 = $i1 - 1 echo $proclist[$i-$i2] >>.machines set i=$i1 end echo 'granularity:1' >>.machines echo 'extrafine:1' >>.machines date run_lapw -p -ec 0.0001 -NI >& STDOUT Any comment is welcome! Thanks in advance! Have a nice weekend! Jia Yalei
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html