[Wien] U correction in Heusler alloys
Dear users i am not able to construct the .indmc, .indm and .inorb files for Heusler alloys. My system has four atoms and i want to apply U correction at first atom, i tried to edit the template which is for two atoms accordingly for four atoms but i failed to correctly construct these. i got the error like cycle 1 (Fri Aug 2 17:08:28 IST 2013) (40/99 to go) lapw0 (17:08:28) WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! 8.7u 0.1s 0:08.84 100.0% 0+0k 0+2008io 0pf+0w lapw1 -up-c(17:08:37) 140.9u 7.5s 2:29.17 99.5% 0+0k 0+297136io 0pf+0w lapw1 -dn-c(17:11:06) 141.7u 7.0s 2:29.25 99.7% 0+0k 0+295944io 0pf+0w lapwso -up -orb -c (17:13:35) 29.0u 1.9s 0:33.34 92.8% 0+0k 6968+912072io 37pf+0w lapw2 -up-c -so (17:14:09) 100.4u 1.8s 1:42.36 99.9% 0+0k 0+3472io 0pf+0w lapw2 -dn-c -so (17:15:51) 101.6u 1.7s 1:43.50 99.9% 0+0k 0+3400io 0pf+0w lapwdm -up -c -so (17:17:35) 13.5u 0.4s 0:14.13 99.1% 0+0k 4240+144io 22pf+0w lcore -up (17:17:49) 0.0u 0.0s 0:00.04 125.0% 0+0k 0+456io 0pf+0w lcore -dn (17:17:49) 0.0u 0.0s 0:00.04 125.0% 0+0k 0+456io 0pf+0w mixer -orb (17:17:49) INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER Note: k-list has changed 0.1u 0.1s 0:00.22 104.5% 0+0k 0+3544io 0pf+0w :ENERGY convergence: 0 0.0001 0 :CHARGE convergence: 0 0.0001 0 cycle 2 (Fri Aug 2 17:17:49 IST 2013) (39/98 to go) lapw0 (17:17:49) 8.9u 0.1s 0:09.04 100.0% 0+0k 0+2528io 0pf+0w orb -up (17:17:58) 0.0u 0.0s 0:00.06 16.6% 0+0k 2336+24io 13pf+0w error: command /home/idris/Desktop/wien2k12/orb uporb.def failed stop error i am attaching the required .indmc, .indm and .inorb files with this mail. Now i need your advice and help, what should i do to avoid the errors. please suggest me the solution. With regards Idris Hamid case.indm Description: Binary data case.indmc Description: Binary data case.inorb Description: Binary data case.inso Description: Binary data ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] U correction in Heusler alloys
Hi, If you want to apply U only for the first atom, then indmc and inorb should be as follows inorb: 1 2 0 nmod, natorb, ipr PRATT 1.0BROYD/PRATT, mixing 1 1 2 iatom nlorb, lorb 1 nsic 0..AFM, 1..SIC, 2..HFM 0.52 0.00U J (Ry) Note: we recommend to use U_eff = U-J and J=0 indmc: -9. Emin cutoff energy 1 number of atoms for which density matrix is calculated 1 1 2 index of 1st atom, number of L's, L1 0 0 r-index, (l,s)index Good luck! Hong ? 2013/8/2 20:00, idris.09 idris ??: Dear users i am not able to construct the .indmc, .indm and .inorb files for Heusler alloys. My system has four atoms and i want to apply U correction at first atom, i tried to edit the template which is for two atoms accordingly for four atoms but i failed to correctly construct these. i got the error like cycle 1 (Fri Aug 2 17:08:28 IST 2013) (40/99 to go) lapw0(17:08:28) WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! 8.7u 0.1s 0:08.84 100.0% 0+0k 0+2008io 0pf+0w lapw1 -up-c (17:08:37) 140.9u 7.5s 2:29.17 99.5% 0+0k 0+297136io 0pf+0w lapw1 -dn-c (17:11:06) 141.7u 7.0s 2:29.25 99.7% 0+0k 0+295944io 0pf+0w lapwso -up -orb -c (17:13:35) 29.0u 1.9s 0:33.34 92.8% 0+0k 6968+912072io 37pf+0w lapw2 -up-c -so (17:14:09) 100.4u 1.8s 1:42.36 99.9% 0+0k 0+3472io 0pf+0w lapw2 -dn-c -so (17:15:51) 101.6u 1.7s 1:43.50 99.9% 0+0k 0+3400io 0pf+0w lapwdm -up -c -so (17:17:35) 13.5u 0.4s 0:14.13 99.1% 0+0k 4240+144io 22pf+0w lcore -up(17:17:49) 0.0u 0.0s 0:00.04 125.0% 0+0k 0+456io 0pf+0w lcore -dn(17:17:49) 0.0u 0.0s 0:00.04 125.0% 0+0k 0+456io 0pf+0w mixer -orb (17:17:49) INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER Note: k-list has changed 0.1u 0.1s 0:00.22 104.5% 0+0k 0+3544io 0pf+0w :ENERGY convergence: 0 0.0001 0 :CHARGE convergence: 0 0.0001 0 cycle 2(Fri Aug 2 17:17:49 IST 2013) (39/98 to go) lapw0(17:17:49) 8.9u 0.1s 0:09.04 100.0% 0+0k 0+2528io 0pf+0w orb -up (17:17:58) 0.0u 0.0s 0:00.06 16.6% 0+0k 2336+24io 13pf+0w error: command /home/idris/Desktop/wien2k12/orb uporb.def failed stop error i am attaching the required .indmc, .indm and .inorb files with this mail. Now i need your advice and help, what should i do to avoid the errors. please suggest me the solution. With regards Idris Hamid ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] U correction in Heusler alloys
For four atoms, change natorb from 2 to 4 and add two more U J lines in case.inorb. On 8/2/2013 5:00 AM, idris.09 idris wrote: Dear users i am not able to construct the .indmc, .indm and .inorb files for Heusler alloys. My system has four atoms and i want to apply U correction at first atom, i tried to edit the template which is for two atoms accordingly for four atoms but i failed to correctly construct these. i got the error like cycle 1 (Fri Aug 2 17:08:28 IST 2013) (40/99 to go) lapw0(17:08:28) WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! WARNING: The EFG-MATRIX is a NULLMATRIX ! 8.7u 0.1s 0:08.84 100.0% 0+0k 0+2008io 0pf+0w lapw1 -up-c (17:08:37) 140.9u 7.5s 2:29.17 99.5% 0+0k 0+297136io 0pf+0w lapw1 -dn-c (17:11:06) 141.7u 7.0s 2:29.25 99.7% 0+0k 0+295944io 0pf+0w lapwso -up -orb -c (17:13:35) 29.0u 1.9s 0:33.34 92.8% 0+0k 6968+912072io 37pf+0w lapw2 -up-c -so (17:14:09) 100.4u 1.8s 1:42.36 99.9% 0+0k 0+3472io 0pf+0w lapw2 -dn-c -so (17:15:51) 101.6u 1.7s 1:43.50 99.9% 0+0k 0+3400io 0pf+0w lapwdm -up -c -so (17:17:35) 13.5u 0.4s 0:14.13 99.1% 0+0k 4240+144io 22pf+0w lcore -up(17:17:49) 0.0u 0.0s 0:00.04 125.0% 0+0k 0+456io 0pf+0w lcore -dn(17:17:49) 0.0u 0.0s 0:00.04 125.0% 0+0k 0+456io 0pf+0w mixer -orb (17:17:49) INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER INFO: LM-LIST in CLMSUM changed in MIXER Note: k-list has changed 0.1u 0.1s 0:00.22 104.5% 0+0k 0+3544io 0pf+0w :ENERGY convergence: 0 0.0001 0 :CHARGE convergence: 0 0.0001 0 cycle 2(Fri Aug 2 17:17:49 IST 2013) (39/98 to go) lapw0(17:17:49) 8.9u 0.1s 0:09.04 100.0% 0+0k 0+2528io 0pf+0w orb -up (17:17:58) 0.0u 0.0s 0:00.06 16.6% 0+0k 2336+24io 13pf+0w error: command /home/idris/Desktop/wien2k12/orb uporb.def failed stop error i am attaching the required .indmc, .indm and .inorb files with this mail. Now i need your advice and help, what should i do to avoid the errors. please suggest me the solution. With regards Idris Hamid ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] A trick for mpi debugging
Dear Prof. Marks, Just a quick question : in case that the openmpi launcher replaces ssh, should I change USE_REMOTE to 0 in a cluster ? Thank you one more time, Luis 2013/7/27 Laurence Marks l-ma...@northwestern.edu WARNING 1: To be used with care, and customized as needed WARNING 2: Valid for impi and perhaps other, but not all variants WARNING 3: Please look at what these options mean... My parallel_options file with NU's supercomputer, which contains various debug and other options (some recommended by Intel, some by the local sys_admin): setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv DAPL_DBG_TYPE 0 # Normal #setenv WIEN_MPIRUN mpirun -n _NP_ -machinefile _HOSTS_ _EXEC_ # To turn on verbose #setenv WIEN_MPIRUN mpirun -bootstrap-exec ~/bin/hssh -n _NP_ -machinefile _HOSTS_ _EXEC_ # To use more recent, privately compiled ssh #setenv WIEN_MPIRUN mpirun -bootstrap-exec $HOME/local/bin/ssh -n _NP_ -machinefile _HOSTS_ _EXEC_ # To use openmpi to launch setenv WIEN_MPIRUN mpirun -bootstrap-exec $WIENROOT/hopen -n _NP_ -machinefile _HOSTS_ _EXEC_ set sleepy = 0.2 set delay = 0.1 unset DAPL_DBG #Turn on Hydra debug on Quest #setenv I_MPI_HYDRA_DEBUG 1 #Turn on MPI DEBUG #setenv I_MPI_DEBUG 1 #setenv I_MPI_DEBUG_OUTPUT mpi_debug%h_%r setenv I_MPI_FABRICS_LIST dapl,tcp setenv I_MPI_FALLBACK enable On Sat, Jul 27, 2013 at 2:53 PM, Luis Ogando lcoda...@gmail.com wrote: Dear Prof. Marks, Could you, please, send me a template for the parallel_options file where this implementation was done ? I am sorry for that, but I am really far from being an expert. All the best, Luis 2013/7/22 Laurence Marks l-ma...@northwestern.edu A brief followup which may be useful (or not) for others in the future with mpi problems. I have been able to work around a mysterious impi/ssh bug on NU's supercomputer by replacing ssh by the openmpi/mpirun launcher. The hack is gross, but very stable. Step 1: 1) Add --bootstrap-exec=$WIENROOT/hopen to $WIENROOT/parallel_options. 2) Create the executable file $WIENROOT/hopen containing #!/bin/bash a=`echo $@ | sed -e 's/-x -q//'` $OPENMPI/bin/mpirun -np 1 --host $a (change $OPENMPI to where it has been compiled). On Thu, Jul 18, 2013 at 10:38 AM, Laurence Marks l-ma...@northwestern.edu wrote: On a cluster I am using I am having a problem with ssh connections as part of impi/mpirun about 0.1-0.2% of the time; what happens is that they fail to launch and become zombie's (ps shows [ssh] defunct). Since fiddling through all the options within mpirun can be hard (particularly for impi which is rather fast), I found (after a comment from someone on the openssh list) a useful hack. I am providing it here as it is a nice way around things, and might be useful to others in the future. The trick is to add --bootstrap-exec ~/bin/hssh or similar to the mpirun line in $WIENROOT/parallel_options, then create the executable ~/bin/hssh with something similar to: #!/bin/bash a=`echo $@ | sed -e 's/-q/-v/'` ssh $a The above allows me to turn verbose output on in the ssh command since impi insists on setting -q (quiet). For other cases something similar can be done. -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] U correction in Heusler alloys
02.08.2013 16:08, Hong Jiang wrote: If you want to apply U only for the first atom, then indmc and inorb should be as follows inorb: 1 2 0 nmod, natorb, ipr PRATT 1.0BROYD/PRATT, mixing 1 1 2 iatom nlorb, lorb 1 nsic 0..AFM, 1..SIC, 2..HFM 0.52 0.00 indmc: -9. Emin cutoff energy 1 number of atoms for which density matrix is calculated 1 1 2 index of 1st atom, number of L's, L1 0 0 r-index, (l,s)index In inorb the first line should be 1 1 0 nmod, natorb, ipr natorb (number of atoms at which you add orb potential)=1 Best wishes Lyudmila Dobysheva -- Phys.-Techn. Institute of Ural Br. of Russian Ac. of Sci. 426001 Izhevsk, ul.Kirova 132 RUSSIA -- Tel.:7(3412) 442118 (home), 218988(office), 722529(Fax) E-mail: l...@ftiudm.ru lyuk...@mail.ru (office) lyuk...@gmail.com (home) Skype: lyuka17 (home), lyuka18 (office) http://fti.udm.ru/content/view/25/103/lang,english/ -- ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] runing error: forrtl: severe (71): integer divide by zero
Thank you very much. It works. I remove -DINTEL_VML and recompile lapw1. When I submit the job the lapw1 has no error, the other program output similar error. So, as Laurence Marks said in the link you gave, I might have a bad libsvml and/or incompatible versions/ifort or similar. I changed the mkl to ~/intel/Compiler/11.1/072/mkl/lib/em64t which is installed combined with ifort compiler, and then recompile all the program. Finaly, the job complete without any error. But I have the other problem about mpi parallel and I will write the other email. Best regards. Jia Yalei At 2013-08-02 00:00:19,Torsten Weissbach torsten.weissb...@physik.tu-freiberg.de wrote: Hi, sorry that was the wrong link, of course. A similar question is discussed here: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg05162.html maybe this can solve your problem. ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
[Wien] There are no allocated resources for the application
Dear all, I compile wien2k 11 on linux centos 5.5 with icc , ifort 11.1, openmpi mpif90, and intel mkl with the following parameter: K1 Linux (Intel ifort 11.1 compiler + mkl ) O Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback L Linker Flags:$(FOPT) -L/home/yljia/intel/Compiler/11.1/072/mkl/lib/em64t -pthread P Preprocessor flags '-DParallel' R R_LIB (LAPACK+BLAS): -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide RP RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 -lmkl_blacs_openmpi_lp64 -L/home/yljia/compiler_library/fftw-2.1.5/lib/ -lfftw_mpi -lfftw $(R_LIBS) FP FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback MP MPIRUN commando: mpirun -np _NP_ --hostfile _HOSTS_ _EXEC_ The program can run in non parallel mode, k point paralle. But in mpi parallel mode , it has error messages in the following two files: 1. STDOUT: LAPW0 END LAPW0 END . LAPW0 END LAPW1 END LAPW1 END LAPW1 END LAPW1 END -- There are no allocated resources for the application /home/yljia/software/wien2k_11/lapw1_mpi that match the requested mapping: .machine5 Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -- LAPW1 END LAPW1 END -- There are no allocated resources for the application /home/yljia/software/wien2k_11/lapw1_mpi that match the requested mapping: .machine6 ... ... .machine8 Verify that you have mapped the allocated resources properly using the --host or --hostfile specification. -- FERMI - Error cp: cannot stat `.in.tmp': No such file or directory rm: cannot remove `.in.tmp': No such file or directory rm: cannot remove `.in.tmp1': No such file or directory stop error 2. TiC.dayfile: Calculating TiC in /home/yljia/wien2k/TiC/testqsub/TiC on compute-0-12.local with PID 16027 using WIEN2k_11.1 (Release 14/6/2011) in /home/yljia/software/wien2k_11 start (Sat Aug 3 00:42:07 CST 2013) with lapw0 (40/99 to go) cycle 1 (Sat Aug 3 00:42:07 CST 2013) (40/99 to go) lapw0 -p(00:42:07) starting parallel lapw0 at Sat Aug 3 00:42:07 CST 2013 .machine0 : 16 processors 5.812u 22.540s 0:04.23 670.2% 0+0k 0+0io 205pf+0w lapw1 -p (00:42:11) starting parallel lapw1 at Sat Aug 3 00:42:12 CST 2013 - starting parallel LAPW1 jobs at Sat Aug 3 00:42:12 CST 2013 running LAPW1 in parallel mode (using .machines) 8 number_of_parallel_jobs compute-0-12 compute-0-12(32) 3.181u 0.181s 0:02.77 121.2% 0+0k 0+0io 33pf+0w compute-0-12 compute-0-12(32) 2.781u 0.117s 0:02.58 112.0% 0+0k 0+0io 18pf+0w compute-0-12 compute-0-12(32) 2.343u 0.089s 0:02.28 106.1% 0+0k 0+0io 17pf+0w compute-0-12 compute-0-12(32) 2.818u 0.126s 0:02.52 116.2% 0+0k 0+0io 17pf+0w compute-0-2 compute-0-2(32) 0.010u 0.012s 0:00.03 66.6%0+0k 0+0io 0pf+0w compute-0-2 compute-0-2(32) 0.009u 0.014s 0:00.03 33.3%0+0k 0+0io 0pf+0w compute-0-2 compute-0-2(32) 0.010u 0.020s 0:00.04 75.0%0+0k 0+0io 0pf+0w compute-0-2 compute-0-2(32) 0.012u 0.020s 0:00.04 75.0%0+0k 0+0io 0pf+0w Summary of lapw1para: compute-0-12 k=0 user=128wallclock=30.78 11.349u 1.617s 0:10.77 120.2% 0+0k 0+0io 85pf+0w lapw2 -p(00:42:22) running LAPW2 in parallel mode ** LAPW2 crashed! 0.076u 0.108s 0:00.20 85.0% 0+0k 0+0io 9pf+0w error: command /home/yljia/software/wien2k_11/lapw2para lapw2.def failed stop error The following is the shell script I submit. I have 2 nodes, and each has 8 cores[except the host node]: #!/bin/tcsh #$ -S /bin/tcsh #$ -N W2web_Job # MPIR_HOME from submitting environment #$ -v MPIR_HOME # needs in # $NSLOTS # the number of tasks to be used # $TMPDIR/machines # a valid machine file to be passed to mpirun #$ -cwd #$ -o job.out #$ -e job.err #$ -q parallel.q #$ -pe mpich 8 # mpich / jobs_per_node = number of nodes set mpijob=1 set jobs_per_node=8 setenv OMP_NUM_THREADS 1 setenv USE_REMOTE 0 echo Got $NSLOTS slots. job.out echo Got $NSLOTS slots. job.err pwd set proclist=`cat $TMPDIR/machines` set nproc=$NSLOTS echo $nproc nodes for this job: $proclist if( -e .proclist_tmp) rm .proclist_tmp if ($jobs_per_node != 8 ) then set j=1 while ($j = $nproc ) @ j1 = $j + $jobs_per_node @
Re: [Wien] A trick for mpi debugging
I am not sure if I can give you the right answer; My guess is to have it as 1, but I do not know all the details of your system and if I remember right you have an sgi system. Try both, then let us/me know what works (or does not). For reference, I have it working fine with USE_REMOTE 1, and I don't currently want to change to test (particularly as I am on travel). On Fri, Aug 2, 2013 at 8:36 AM, Luis Ogando lcoda...@gmail.com wrote: Dear Prof. Marks, Just a quick question : in case that the openmpi launcher replaces ssh, should I change USE_REMOTE to 0 in a cluster ? Thank you one more time, Luis 2013/7/27 Laurence Marks l-ma...@northwestern.edu WARNING 1: To be used with care, and customized as needed WARNING 2: Valid for impi and perhaps other, but not all variants WARNING 3: Please look at what these options mean... My parallel_options file with NU's supercomputer, which contains various debug and other options (some recommended by Intel, some by the local sys_admin): setenv USE_REMOTE 1 setenv MPI_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv DAPL_DBG_TYPE 0 # Normal #setenv WIEN_MPIRUN mpirun -n _NP_ -machinefile _HOSTS_ _EXEC_ # To turn on verbose #setenv WIEN_MPIRUN mpirun -bootstrap-exec ~/bin/hssh -n _NP_ -machinefile _HOSTS_ _EXEC_ # To use more recent, privately compiled ssh #setenv WIEN_MPIRUN mpirun -bootstrap-exec $HOME/local/bin/ssh -n _NP_ -machinefile _HOSTS_ _EXEC_ # To use openmpi to launch setenv WIEN_MPIRUN mpirun -bootstrap-exec $WIENROOT/hopen -n _NP_ -machinefile _HOSTS_ _EXEC_ set sleepy = 0.2 set delay = 0.1 unset DAPL_DBG #Turn on Hydra debug on Quest #setenv I_MPI_HYDRA_DEBUG 1 #Turn on MPI DEBUG #setenv I_MPI_DEBUG 1 #setenv I_MPI_DEBUG_OUTPUT mpi_debug%h_%r setenv I_MPI_FABRICS_LIST dapl,tcp setenv I_MPI_FALLBACK enable On Sat, Jul 27, 2013 at 2:53 PM, Luis Ogando lcoda...@gmail.com wrote: Dear Prof. Marks, Could you, please, send me a template for the parallel_options file where this implementation was done ? I am sorry for that, but I am really far from being an expert. All the best, Luis 2013/7/22 Laurence Marks l-ma...@northwestern.edu A brief followup which may be useful (or not) for others in the future with mpi problems. I have been able to work around a mysterious impi/ssh bug on NU's supercomputer by replacing ssh by the openmpi/mpirun launcher. The hack is gross, but very stable. Step 1: 1) Add --bootstrap-exec=$WIENROOT/hopen to $WIENROOT/parallel_options. 2) Create the executable file $WIENROOT/hopen containing #!/bin/bash a=`echo $@ | sed -e 's/-x -q//'` $OPENMPI/bin/mpirun -np 1 --host $a (change $OPENMPI to where it has been compiled). On Thu, Jul 18, 2013 at 10:38 AM, Laurence Marks l-ma...@northwestern.edu wrote: On a cluster I am using I am having a problem with ssh connections as part of impi/mpirun about 0.1-0.2% of the time; what happens is that they fail to launch and become zombie's (ps shows [ssh] defunct). Since fiddling through all the options within mpirun can be hard (particularly for impi which is rather fast), I found (after a comment from someone on the openssh list) a useful hack. I am providing it here as it is a nice way around things, and might be useful to others in the future. The trick is to add --bootstrap-exec ~/bin/hssh or similar to the mpirun line in $WIENROOT/parallel_options, then create the executable ~/bin/hssh with something similar to: #!/bin/bash a=`echo $@ | sed -e 's/-q/-v/'` ssh $a The above allows me to turn verbose output on in the ssh command since impi insists on setting -q (quiet). For other cases something similar can be done. -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html -- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Gyorgi ___ Wien mailing list