[Wien] problem in parallel calculations

2012-04-20 Thread hyunjung kim
Dear all,

It has been almost 1 month since I have been tried to make parallel 
calculations.

Im working on 
model : SUN Blade 6275 clusters
Processor: Intel Xeon X5570
CPU/node : 8cpu
Memory : 24GB/node, 3GB/core
Network: Infiniband 40G 8X QDR
Operation: Redhat Enterprise Linux 5.3
Job control : SGE 6.2u5

Compiler : intel 11.1 (MKL therein)
MPI : openMPI 1.3.3
FFTW: 2.1.5 (FFTW was compiled with intel 11.1 and configured with 
--enable-mpi LDFLAGS=-L$MPIHOME/$LIBRARYPATH F77=ifort CC=icc --with-sgi-mp 
--with-openmp --enable-threads)


Compiler option
 O   Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-mcmodel=medium -i-dynamic -CB -g -traceback -I$(MKLROOT)/include
 L   Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) 
-pthread
 P   Preprocessor flags   '-DParallel'
 R   R_LIB (LAPACK+BLAS): -lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread 
-lmkl_core -openmp -lpthread -lguide

 RP  RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 
-lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS)
 FP  FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-mcmodel=medium -i-dynamic -CB -g -traceback -I$(MKLROOT)/include
 MP  MPIRUN commando: mpirun -mca btl ^tcp -mca plm_rsh_num_concurrent 
48 -mca oob_tcp_listen_mode listen_thread -mca plm_rsh_tree_spawn 1 -np _NP_ 
-machinefile _HOSTS_ _EXEC_



Within this environment, the compilation goes without any error messages.

To make .machines file, I type proclist=(`cat $TMPDIR/machines`).
It gives me the list of nodes according to the number of cpu.
If I set the total number of cpu 384 in the jobscript file, it export 384 
result.
Since it exports the name of each nodes, there is 8 same node. 

case1 : k-point parallelism + 8 mpi task per k-point
1. In my case, I owing to calculate with 48 k-points and 8 mpi tasks per 
node(per k-points), the machine file was,

lapw0:tachyon2066:8 tachyon1982:8 tachyon1207:8 tachyon1396:8 tachyon1152:8 
tachyon2440:8 tachyon2120:8 tachyon1555:8 tachyoo
n2319:8 tachyon2470:8 tachyon1612:8 tachyon2274:8 tachyon1402:8 tachyon2846:8 
tachyon2091:8 tachyon1622:8 tachyon1920:8 tachh
yon2213:8 tachyon1832:8 tachyon2672:8 tachyon2370:8 tachyon2545:8 tachyon2359:8 
tachyon1770:8 tachyon1018:8 tachyon1456:8 taa
chyon1429:8 tachyon3074:8 tachyon1169:8 tachyon2400:8 tachyon2688:8 
tachyon1099:8 tachyon2906:8 tachyon1394:8 tachyon1830:8  
tachyon1383:8 tachyon2157:8 tachyon2818:8 tachyon2644:8 tachyon2283:8 
tachyon1213:8 tachyon1542:8 tachyon2726:8 tachyon2152::
8 tachyon1135:8 tachyon2144:8 tachyon3015:8 tachyon2077:8
1:tachyon2066:8
1:tachyon1982:8
1:tachyon1207:8
1:tachyon1396:8
1:tachyon1152:8
1:tachyon2440:8
1:tachyon2120:8
1:tachyon1555:8
1:tachyon2319:8
1:tachyon2470:8
1:tachyon1612:8
1:tachyon2274:8
1:tachyon1402:8
1:tachyon2846:8
1:tachyon2091:8
1:tachyon1622:8
1:tachyon1920:8
1:tachyon2213:8
1:tachyon1832:8
1:tachyon2672:8
1:tachyon2370:8
1:tachyon2545:8
1:tachyon2359:8
1:tachyon1770:8
1:tachyon1018:8
1:tachyon1456:8
1:tachyon1429:8
1:tachyon3074:8
1:tachyon1169:8
1:tachyon2400:8
1:tachyon2688:8
1:tachyon1099:8
1:tachyon2906:8
1:tachyon1394:8
1:tachyon1830:8
1:tachyon1383:8
1:tachyon2157:8
1:tachyon2818:8
1:tachyon2644:8
1:tachyon2283:8
1:tachyon1213:8
1:tachyon1542:8
1:tachyon2726:8
1:tachyon2152:8
1:tachyon1135:8
1:tachyon2144:8
1:tachyon3015:8
1:tachyon2077:8
granularity:1
extrafine:1
lapw2_vector_split:1

In this case, 

case.dayfile shows

on tachyon2066 with PID 13780
using WIEN2k_11.1 (Release 5/4/2011) in /home01/x584cjh/code/WIEN2k_11


start   (Fri Apr 20 09:13:32 KST 2012) with lapw0 (40/99 to go)

cycle 1 (Fri Apr 20 09:13:32 KST 2012)  (40/99 to go)

   lapw0 -p(09:13:32) starting parallel lapw0 at Fri Apr 20 09:13:32 KST 
 2012
 .machine0 : 384 processors
tachyon2066:14892:  open_hca: getaddr_netdev ERROR: Connection refused. Is ib1 
configured?
tachyon2066:14892:  open_hca: device mthca0 not found
tachyon2066:14892:  open_hca: device mthca0 not found
tachyon2066:14892:  open_hca: device ipath0 not found
tachyon2066:14892:  open_hca: device ipath0 not found
tachyon2066:14894:  open_hca: getaddr_netdev ERROR: Connection refused. Is ib1 
configured?
tachyon2066:14894:  open_hca: device mthca0 not found
tachyon2066:14894:  open_hca: device mthca0 not found
tachyon2066:14891:  open_hca: getaddr_netdev ERROR: Connection refused. Is ib1 
configured?
tachyon2066:14894:  open_hca: device ipath0 not found
tachyon2066:14894:  open_hca: device ipath0 not found
tachyon2319:23519:  open_hca: getaddr_netdev ERROR: Connection refused. Is ib1 
configured?
tachyon2066:14891:  open_hca: device mthca0 not found
tachyon2066:14891:  open_hca: device mthca0 not found
tachyon1982:11799:  open_hca: getaddr_netdev ERROR: Connection refused. Is ib1 
configured?
tachyon2319:23519:  open_hca: device mthca0 not found
tachyon2319:23519:  open_hca: device mthca0 not found
tachyon1982:11799:  open_hca: 

[Wien] forrtl: severe (41): insufficient virtual memory

2012-04-15 Thread hyunjung kim
Dear all,

I constantly got following error messages when the parallel job was submitted.

I attach it.
Also the generated .machines file attached, please check whether it is properly 
generated or not. I intended 24 k-point parallelized job.

The compiler version is 
fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within 
ifort 11.1 version, so I guess that fortran version is not the origin of this 
problem..]
openmpi : 1.4.5
FFTW2   : 2.1.5
CC  : icc, 12.0 (2011.3.174)
compiler option
 O   Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include
 L   Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) 
-pthread
 P   Preprocessor flags   '-DParallel'
 R   R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 
-lmkl_intel_thread -lmkl_core -openmp -lpthread

 RP  RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 
-lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS)
 FP  FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include
 MP  MPIRUN commando: mpirun -mca btl self,openib -mca 
plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca 
plm_rsh_tree_spawn 1 -np _NP_ -machinefile _HOSTS_ _EXEC_


The error messages is:
~~ abbreviation ~
 LAPW0 END 
 LAPW0 END 
 LAPW0 END 
 LAPW0 END 
 LAPW0 END
 LAPW0 END 
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END
 LAPW1 END 
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END
 LAPW1 END 
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
forrtl: severe (41): insufficient virtual memory
Image  PCRoutineLineSource
libintlc.so.5  2B0540E88F7A  Unknown   Unknown  Unknown
libintlc.so.5  2B0540E87AF5  Unknown   Unknown  Unknown
libifcoremt.so.5   2B0540058CF2  Unknown   Unknown  Unknown
libifcoremt.so.5   2B053FFCAAAB  Unknown   Unknown  Unknown
libifcoremt.so.5   2B054001AFBA  Unknown   Unknown  Unknown
libifcoremt.so.5   2B054001AE11  Unknown   Unknown  Unknown
lapwso 004281C0  MAIN__131  lapwso.f
lapwso 00402A9C  Unknown   Unknown  Unknown
libc.so.6  003CFA61D974  Unknown   Unknown  Unknown
lapwso 004029A9  Unknown   Unknown  Unknown
forrtl: severe (41): insufficient virtual memory
Image  PCRoutineLineSource
libintlc.so.5  2B5D32256F7A  Unknown   Unknown  Unknown
libintlc.so.5  2B5D32255AF5  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D31426CF2  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D31398AAB  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D313E8FBA  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D313E8E11  Unknown   Unknown  Unknown
lapwso 00409A6A  hmsout_mp_init_hm  78  modules.f
lapwso 004280E2  MAIN__130  lapwso.f
lapwso 00402A9C  Unknown   Unknown  Unknown
libc.so.6  003CFA61D974  Unknown   Unknown  Unknown
~~ abbreviation ~~

I note that the compilation was done without any error messages. 

Any advice will be greatly appreciated!



Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889
Department of Physics   | 
Hanyang University  | e-mail: angpangmokjang at hanmail.net 
17 Haengdang-Dong   | 
133-791 Seongdong-Ku,Seoul/Korea|

www: http://physics.hanyang.ac.kr/~sst/











-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120415/6fbd92f2/attachment.htm


[Wien] forrtl: severe (41): insufficient virtual memory (file attached!!)

2012-04-15 Thread hyunjung kim
Dear all,

(I'm sorry, I forgot to attach file which including error message and job 
script files)

I constantly got following error messages when the parallel job was submitted.

I attach it.
Also the generated .machines file is attached, please check whether it is 
properly generated or not. I intended to do 24 k-point parallelized job.

The compiler version is 
fortran : ifort, 12.0 (2011.3.174), mpif90 [ I got same error message within 
ifort 11.1 version, so I guess that fortran version is not the origin of this 
problem..]
openmpi : 1.4.5
FFTW2   : 2.1.5
CC  : icc, 12.0 (2011.3.174)
compiler option
 O   Compiler options:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include
 L   Linker Flags:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) 
-pthread
 P   Preprocessor flags   '-DParallel'
 R   R_LIB (LAPACK+BLAS): -lmkl_lapack95_lp64 -lmkl_intel_lp64 
-lmkl_intel_thread -lmkl_core -openmp -lpthread

 RP  RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64 
-lmkl_blacs_lp64 -L$(FFTWPATH)/lib -lfftw_mpi -lfftw $(R_LIBS)
 FP  FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-mcmodel=medium -i-dynamic -traceback -I$(MKLROOT)/include
 MP  MPIRUN commando: mpirun -mca btl self,openib -mca 
plm_rsh_num_concurrent 400 -mca oob_tcp_listen_mode listen_thread -mca 
plm_rsh_tree_spawn 1 -np _NP_ -machinefile _HOSTS_ _EXEC_


The error messages is:
~~ abbreviation ~
 LAPW0 END 
 LAPW0 END 
 LAPW0 END 
 LAPW0 END 
 LAPW0 END
 LAPW0 END 
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW0 END
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END
 LAPW1 END 
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END 
 LAPW1 END
 LAPW1 END 
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
forrtl: severe (41): insufficient virtual memory
Image  PCRoutineLineSource
libintlc.so.5  2B0540E88F7A  Unknown   Unknown  Unknown
libintlc.so.5  2B0540E87AF5  Unknown   Unknown  Unknown
libifcoremt.so.5   2B0540058CF2  Unknown   Unknown  Unknown
libifcoremt.so.5   2B053FFCAAAB  Unknown   Unknown  Unknown
libifcoremt.so.5   2B054001AFBA  Unknown   Unknown  Unknown
libifcoremt.so.5   2B054001AE11  Unknown   Unknown  Unknown
lapwso 004281C0  MAIN__131  lapwso.f
lapwso 00402A9C  Unknown   Unknown  Unknown
libc.so.6  003CFA61D974  Unknown   Unknown  Unknown
lapwso 004029A9  Unknown   Unknown  Unknown
forrtl: severe (41): insufficient virtual memory
Image  PCRoutineLineSource
libintlc.so.5  2B5D32256F7A  Unknown   Unknown  Unknown
libintlc.so.5  2B5D32255AF5  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D31426CF2  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D31398AAB  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D313E8FBA  Unknown   Unknown  Unknown
libifcoremt.so.5   2B5D313E8E11  Unknown   Unknown  Unknown
lapwso 00409A6A  hmsout_mp_init_hm  78  modules.f
lapwso 004280E2  MAIN__130  lapwso.f
lapwso 00402A9C  Unknown   Unknown  Unknown
libc.so.6  003CFA61D974  Unknown   Unknown  Unknown
~~ abbreviation ~~

I note that the compilation was done without any error messages. 

Any advice will be greatly appreciated!


Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889
Department of Physics   | 
Hanyang University  | e-mail: angpangmokjang at hanmail.net 
17 Haengdang-Dong   | 
133-791 Seongdong-Ku,Seoul/Korea|

www: http://physics.hanyang.ac.kr/~sst/











-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120415/b93c185d/attachment.htm
-- next part --
A non-text attachment was scrubbed...
Name: error.zip
Type: application/zip
Size: 8025 bytes
Desc: not available
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120415/b93c185d/attachment.zip
-- next part --
An HTML attachment was scrubbed...
URL: 

[Wien] Emax in case.in1 within spin-orbit coupling calculations

2012-03-19 Thread hyunjung kim
Dear all,

I have some question on calculation including spin-orbit coupling (SOC). 

In according to the manual, the second-variational procedure requires to 
include many more unoccupied states to be calculated. And it can be controlled 
by increasing the energy maximum (Emax) which eigenvalues shall be searched. 

For the Emax which defines the number of eigenvalues to be calculated, I have 
used Emax=2.5 Ry in non-SOC and 5.0 Ry for SOC. Thus, the number of eigenvalues 
in SOC calculation were increased by about 2 times larger than that of non-SOC. 
And also I have check the convergence in the function of Emax value. For 
non-SOC calculations, it does not depend on Emax value as expected. However in 
the case of SOC calculations, total energy was depend on the Emax value. Larger 
Emax results the total energy go down. But up to 10Ry of Emax, I cannot find 
convergence in total energy. You can find the numbers below.

system : Bismuth bulk (including SOC) ; total energy
Emax = 2   ; E(total) =  -86326.2333  Ry
Emax = 2.5 ; E(total) =  -86326.2420  Ry
Emax = 3   ; E(total) =  -86326.2454  Ry
Emax = 4   ; E(total) =  -86326.2527  Ry
Emax = 5   ; E(total) =  -86326.2577  Ry
Emax = 7   ; E(total) =  -86326.2634  Ry
Emax = 10  ; E(total) =  -86326.2662  Ry


Questions are,
1. Why the unoccupied states affect the total energy in the case of SOC been 
included?
2. Is there appropriate suggestions taking Emax value?
3. Should the total energy converged as the function of Emax?
4. Within same Emax restriction (i.e. same parameters), could the total energy 
be compared between different geometries? 

Thank you.
Best regards,
Hyun-Jung Kim.


Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889
Department of Physics   | 
Hanyang University  | e-mail: angpangmokjang at hanmail.net 
17 Haengdang-Dong   | 
133-791 Seongdong-Ku,Seoul/Korea|

www: http://physics.hanyang.ac.kr/~sst/











-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120319/ae2cbfec/attachment.htm


[Wien] Emax in case.in1 within spin-orbit coupling calculations

2012-03-19 Thread hyunjung kim
Dear all,

I have some question on calculation including spin-orbit coupling (SOC). 

In according to the manual, the second-variational procedure requires to 
include many more unoccupied states to be calculated. And it can be controlled 
by increasing the energy maximum (Emax) which eigenvalues shall be searched. 

For the Emax which defines the number of eigenvalues to be calculated, I have 
used Emax=2.5 Ry in non-SOC and 5.0 Ry for SOC. Thus, the number of eigenvalues 
in SOC calculation were increased by about 2 times larger than that of non-SOC. 
And also I have check the convergence in the function of Emax value. For 
non-SOC calculations, it does not depend on Emax value as expected. However in 
the case of SOC calculations, total energy was depend on the Emax value. Larger 
Emax results the total energy go down. But up to 10Ry of Emax, I cannot find 
convergence in total energy. You can find the numbers below.

system : Bismuth bulk (including SOC) ; total energy
Emax = 2   ; E(total) =  -86326.2333  Ry
Emax = 2.5 ; E(total) =  -86326.2420  Ry
Emax = 3   ; E(total) =  -86326.2454  Ry
Emax = 4   ; E(total) =  -86326.2527  Ry
Emax = 5   ; E(total) =  -86326.2577  Ry
Emax = 7   ; E(total) =  -86326.2634  Ry
Emax = 10  ; E(total) =  -86326.2662  Ry


Questions are,
1. Why the unoccupied states affect the total energy in the case of SOC been 
included?
2. Is there appropriate suggestions taking Emax value?
3. Should the total energy converged as the function of Emax?
4. Within same Emax restriction (i.e. same parameters), could the total energy 
be compared between different geometries? 

Thank you.
Best regards,
Hyun-Jung Kim.


Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889
Department of Physics   | 
Hanyang University  | e-mail: angpangmokjang at hanmail.net 
17 Haengdang-Dong   | 
133-791 Seongdong-Ku,Seoul/Korea|

www: http://physics.hanyang.ac.kr/~sst/











-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120319/be9a99a3/attachment.htm


[Wien] Question on spin-orbit coupling calculation

2012-03-19 Thread hyunjung kim
Dear all,

I have some question on calculation including spin-orbit coupling (SOC). 

In according to the manual, the second-variational procedure requires to 
include many more unoccupied states to be calculated. And it can be controlled 
by increasing the energy maximum (Emax) which eigenvalues shall be searched. 

For the Emax which defines the number of eigenvalues to be calculated, I have 
used Emax=2.5 Ry in non-SOC and 5.0 Ry for SOC. Thus, the number of eigenvalues 
in SOC calculation were increased by about 2 times larger than that of non-SOC. 
And also I have check the convergence in the function of Emax value. For 
non-SOC calculations, it does not depend on Emax value as expected. However in 
the case of SOC calculations, total energy was depend on the Emax value. Larger 
Emax results the total energy go down. But up to 10Ry of Emax, I cannot find 
convergence in total energy. You can find the numbers below.

system : Bismuth bulk (including SOC) ; total energy
Emax = 2   ; E(total) =  -86326.2333  Ry
Emax = 2.5 ; E(total) =  -86326.2420  Ry
Emax = 3   ; E(total) =  -86326.2454  Ry
Emax = 4   ; E(total) =  -86326.2527  Ry
Emax = 5   ; E(total) =  -86326.2577  Ry
Emax = 7   ; E(total) =  -86326.2634  Ry
Emax = 10  ; E(total) =  -86326.2662  Ry


Questions are,
1. Why the unoccupied states affect the total energy in the case of SOC been 
included?
2. Is there appropriate suggestions taking Emax value?
3. Should the total energy converged as the function of Emax?
4. Within same Emax restriction (i.e. same parameters), could the total energy 
be compared between different geometries? 

Thank you.
Best regards,
Hyun-Jung Kim.




Hyun-Jung Kim (Ph.D student)| phone : ++82 10 7335 7889
Department of Physics   | 
Hanyang University  | e-mail: angpangmokjang at hanmail.net 
17 Haengdang-Dong   | 
133-791 Seongdong-Ku,Seoul/Korea|

www: http://physics.hanyang.ac.kr/~sst/











-- next part --
An HTML attachment was scrubbed...
URL: 
http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120319/1aae7a9b/attachment.htm