[Wien] forrtl: severe (41): insufficient virtual memory
Dear all, I constantly got following error messages when the parallel job was submitted. I attach it. Also the generated .machines file attached, please check whether it is properly generated or not. I intended 24 k-point parallelized job. The compiler version is fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within ifort 11.1 version, so I guess that fortran version is not the origin of this problem..] openmpi : 1.4.5 FFTW2 : 2.1.5 CC : icc, 12.0 (2011.3.174) compiler option O Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include L Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread P Preprocessor flags '-DParallel' R R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS) FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include MP MPIRUN commando: mpirun -mca btl self,openib -mca plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca plm_rsh_tree_spawn 1 -np _NP_ -machinefile _HOSTS_ _EXEC_ The error messages is: ~~ abbreviation ~ LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END forrtl: severe (41): insufficient virtual memory Image PCRoutineLineSource libintlc.so.5 2B0540E88F7A Unknown Unknown Unknown libintlc.so.5 2B0540E87AF5 Unknown Unknown Unknown libifcoremt.so.5 2B0540058CF2 Unknown Unknown Unknown libifcoremt.so.5 2B053FFCAAAB Unknown Unknown Unknown libifcoremt.so.5 2B054001AFBA Unknown Unknown Unknown libifcoremt.so.5 2B054001AE11 Unknown Unknown Unknown lapwso 004281C0 MAIN__131 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown lapwso 004029A9 Unknown Unknown Unknown forrtl: severe (41): insufficient virtual memory Image PCRoutineLineSource libintlc.so.5 2B5D32256F7A Unknown Unknown Unknown libintlc.so.5 2B5D32255AF5 Unknown Unknown Unknown libifcoremt.so.5 2B5D31426CF2 Unknown Unknown Unknown libifcoremt.so.5 2B5D31398AAB Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8FBA Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8E11 Unknown Unknown Unknown lapwso 00409A6A hmsout_mp_init_hm 78 modules.f lapwso 004280E2 MAIN__130 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown ~~ abbreviation ~~ I note that the compilation was done without any error messages. Any advice will be greatly appreciated! Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889 Department of Physics | Hanyang University | e-mail: angpangmokjang at hanmail.net 17 Haengdang-Dong | 133-791 Seongdong-Ku,Seoul/Korea| www: http://physics.hanyang.ac.kr/~sst/ -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120415/6fbd92f2/attachment.htm
[Wien] forrtl: severe (41): insufficient virtual memory (file attached!!)
Dear all, (I'm sorry, I forgot to attach file which including error message and job script files) I constantly got following error messages when the parallel job was submitted. I attach it. Also the generated .machines file is attached, please check whether it is properly generated or not. I intended to do 24 k-point parallelized job. The compiler version is fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within ifort 11.1 version, so I guess that fortran version is not the origin of this problem..] openmpi : 1.4.5 FFTW2 : 2.1.5 CC : icc, 12.0 (2011.3.174) compiler option O Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include L Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread P Preprocessor flags '-DParallel' R R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS) FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include MP MPIRUN commando: mpirun -mca btl self,openib -mca plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca plm_rsh_tree_spawn 1 -np _NP_ -machinefile _HOSTS_ _EXEC_ The error messages is: ~~ abbreviation ~ LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END forrtl: severe (41): insufficient virtual memory Image PCRoutineLineSource libintlc.so.5 2B0540E88F7A Unknown Unknown Unknown libintlc.so.5 2B0540E87AF5 Unknown Unknown Unknown libifcoremt.so.5 2B0540058CF2 Unknown Unknown Unknown libifcoremt.so.5 2B053FFCAAAB Unknown Unknown Unknown libifcoremt.so.5 2B054001AFBA Unknown Unknown Unknown libifcoremt.so.5 2B054001AE11 Unknown Unknown Unknown lapwso 004281C0 MAIN__131 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown lapwso 004029A9 Unknown Unknown Unknown forrtl: severe (41): insufficient virtual memory Image PCRoutineLineSource libintlc.so.5 2B5D32256F7A Unknown Unknown Unknown libintlc.so.5 2B5D32255AF5 Unknown Unknown Unknown libifcoremt.so.5 2B5D31426CF2 Unknown Unknown Unknown libifcoremt.so.5 2B5D31398AAB Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8FBA Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8E11 Unknown Unknown Unknown lapwso 00409A6A hmsout_mp_init_hm 78 modules.f lapwso 004280E2 MAIN__130 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown ~~ abbreviation ~~ I note that the compilation was done without any error messages. Any advice will be greatly appreciated! Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889 Department of Physics | Hanyang University | e-mail: angpangmokjang at hanmail.net 17 Haengdang-Dong | 133-791 Seongdong-Ku,Seoul/Korea| www: http://physics.hanyang.ac.kr/~sst/ -- next part -- An HTML attachment was scrubbed... URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120415/b93c185d/attachment.htm -- next part -- A non-text attachment was scrubbed... Name: error.zip Type: application/zip Size: 8025 bytes Desc: not available URL: http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120415/b93c185d/attachment.zip -- next part -- An HTML attachment was scrubbed... URL:
[Wien] forrtl: severe (41): insufficient virtual memory (file attached!!)
The dayfile indicates that you are doing a non-mpi, but k-point parallel calculation using 8 k-parallel lapw1 jobs per node. (only lapw0 runs mpi-parallel) However, the timing is strange: tachyon1218(1) 527.132u 2.121s 25:49.23 34.1% indicating that a job which should run 530 seconds (9 minutes) needs actually 3 times as long. This usually means that i) your memory is insufficient, or ii) somebody else is using the same node too or iii) it is not a real 8-core but eg. only a 4 core node. In any case, the error is in lapwso (which is never mpi-parallel), and it seems rather clear, that you do not have enough memory to run 8 parallel lapwso jobs on one node. Modify your script such that you are using only 4 parallel jobs per node. That should be much faster and the memory should probably be sufficient. Am 15.04.2012 02:49, schrieb hyunjung kim: Dear all, (I'm sorry, I forgot to attach file which including error message and job script files) I constantly got following error messages when the parallel job was submitted. I attach it. Also the generated .machines file is attached, please check whether it is properly generated or not. I intended to do 24 k-point parallelized job. The compiler version is fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within ifort 11.1 version, so I guess that fortran version is not the origin of this problem..] openmpi : 1.4.5 FFTW2 : 2.1.5 CC : icc, 12.0 (2011.3.174) compiler option O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include L Linker Flags: $(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread P Preprocessor flags '-DParallel' R R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS) FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include MP MPIRUN commando : mpirun -mca btl self,openib -mca plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca plm_rsh_tree_spawn 1 -np _NP_ -machinefile _HOSTS_ _EXEC_ The error messages is: ~~ abbreviation ~ LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END forrtl: severe (41): insufficient virtual memory Image PC Routine Line Source libintlc.so.5 2B0540E88F7A Unknown Unknown Unknown libintlc.so.5 2B0540E87AF5 Unknown Unknown Unknown libifcoremt.so.5 2B0540058CF2 Unknown Unknown Unknown libifcoremt.so.5 2B053FFCAAAB Unknown Unknown Unknown libifcoremt.so.5 2B054001AFBA Unknown Unknown Unknown libifcoremt.so.5 2B054001AE11 Unknown Unknown Unknown lapwso 004281C0 MAIN__ 131 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown lapwso 004029A9 Unknown Unknown Unknown forrtl: severe (41): insufficient virtual memory Image PC Routine Line Source libintlc.so.5 2B5D32256F7A Unknown Unknown Unknown libintlc.so.5 2B5D32255AF5 Unknown Unknown Unknown libifcoremt.so.5 2B5D31426CF2 Unknown Unknown Unknown libifcoremt.so.5 2B5D31398AAB Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8FBA Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8E11 Unknown Unknown Unknown lapwso 00409A6A hmsout_mp_init_hm 78 modules.f lapwso 004280E2 MAIN__ 130 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown ~~ abbreviation ~~ I note that the compilation was done without any error messages. Any advice will be greatly appreciated! Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889 Department of Physics| Hanyang University| e-mail: angpangmokjang at h mailto:hyunjung at fhi-berlin.mpg.deanmail.net http://anmail.net 17 Haengdang-Dong| 133-791 Seongdong-Ku,Seoul/Korea| www: http://physics.hanyang.ac.kr/~sst/ = ___ Wien mailing list Wien at zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- - Peter Blaha Inst. Materials Chemistry, TU Vienna Getreidemarkt 9, A-1060 Vienna, Austria Tel: +43-1-5880115671 Fax:
[Wien] forrtl: severe (41): insufficient virtual memory (file attached!!)
It is exactly what it says. You are trying to run more tasks on a single cpu than you have memory for. The idea of mpi is to share cpu and memory. If you have a cpu with 24 cores (unlikely) you might run (for instance) 3 tasks each using 8 cores, e.g. with three lines of node:8. You probably only have 8 cores, so for a large job you might use node:8 Please do a little google searching on the principles of mpi, much better than any email response. --- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi On Apr 14, 2012 7:49 PM, hyunjung kim angpangmokjang at hanmail.net wrote: Dear all, (I'm sorry, I forgot to attach file which including error message and job script files) I constantly got following error messages when the parallel job was submitted. I attach it. Also the generated .machines file is attached, please check whether it is properly generated or not. I intended to do 24 k-point parallelized job. The compiler version is fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within ifort 11.1 version, so I guess that fortran version is not the origin of this problem..] openmpi : 1.4.5 FFTW2 : 2.1.5 CC : icc, 12.0 (2011.3.174) compiler option O Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include L Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread P Preprocessor flags '-DParallel' R R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS) FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include MP MPIRUN commando: mpirun -mca btl self,openib -mca plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca plm_rsh_tree_spawn 1 -np _NP_ -machinefile _HOSTS_ _EXEC_ The error messages is: ~~ abbreviation ~ LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END forrtl: severe (41): insufficient virtual memory Image PCRoutineLineSource libintlc.so.5 2B0540E88F7A Unknown Unknown Unknown libintlc.so.5 2B0540E87AF5 Unknown Unknown Unknown libifcoremt.so.5 2B0540058CF2 Unknown Unknown Unknown libifcoremt.so.5 2B053FFCAAAB Unknown Unknown Unknown libifcoremt.so.5 2B054001AFBA Unknown Unknown Unknown libifcoremt.so.5 2B054001AE11 Unknown Unknown Unknown lapwso 004281C0 MAIN__131 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown lapwso 004029A9 Unknown Unknown Unknown forrtl: severe (41): insufficient virtual memory Image PCRoutineLineSource libintlc.so.5 2B5D32256F7A Unknown Unknown Unknown libintlc.so.5 2B5D32255AF5 Unknown Unknown Unknown libifcoremt.so.5 2B5D31426CF2 Unknown Unknown Unknown libifcoremt.so.5 2B5D31398AAB Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8FBA Unknown Unknown Unknown libifcoremt.so.5 2B5D313E8E11 Unknown Unknown Unknown lapwso 00409A6A hmsout_mp_init_hm 78 modules.f lapwso 004280E2 MAIN__130 lapwso.f lapwso 00402A9C Unknown Unknown Unknown libc.so.6 003CFA61D974 Unknown Unknown Unknown ~~ abbreviation ~~ I note that the compilation was done without any error messages. Any advice will be greatly appreciated! Hyun-Jung Kim (Ph.D student) | phone : ++82 10 7335 7889 Department of Physics | Hanyang University | e-mail: angpangmokjang at h hyunjung at fhi-berlin.mpg.de anmail.net 17 Haengdang-Dong |