[Wien] Parallelization and PBS on a single computer

2017-06-28 Thread Yoji Kobayashi
Dear Users,

I have a some questions/problems regarding parallelization and PBS. 
I’m not sure if I’m really running parallel vs. serial, and my PBS script isn’t 
working.

===
My system info:
Intel Xeon CPU E5-2630 v2 @2.6 GHz, 24 CPUS
Memory: 32GB
Running Wien2k_13, on Ubuntu 14.04.03
File system: ext4
(This is considered a single node with 24 processors?)
===
My first question is, am I really running a parallel calculation in a 
meaningful way?

What I try:
In w2web, a serial calculation (SCF only)  for the TiC example  (500 k points) 
takes about 25 sec. to converge.
I do the same calculation (starting with a new case) but setting 
parallelization in w2web, with slightly different .machine files for each case:

Case 1:
1:localhost

Case 2 (i.e. 20 lines of below):
1:localhost
1:localhost
…
1:localhost
1:localhost

Case 3
1:localhost:20

(no lines referring to granularity, etc for now)

What I get:
Case 1 computes in about 54 sec;
Case 2 computes in 1min23 sec.;
Case 3 gives an error in running lapw2, see the dayfile below:
-
Calculating YK-016-TiC in /home/milkbar/Yoji/YK-016-TiC
on milkbar-computer with PID 18077
using WIEN2k_13.1 (Release 17/6/2013) in /home/milkbar/WIEN2k_13


start   (2017年  6月 29日 木曜日 14:23:39 JST) with lapw0 (40/99 
to go)

cycle 1 (2017年  6月 29日 木曜日 14:23:39 JST)(40/99 to go)

>   lapw0 -p(14:23:39) starting parallel lapw0 at 2017年  6月 29日 
> 木曜日 14:23:39 JST
 .machine0 : processors
running lapw0 in single mode
1.7u 0.0s 0:01.84 98.3% 0+0k 16+440io 0pf+0w
>   lapw1  -p   (14:23:41) starting parallel lapw1 at 2017年  6月 
> 29日 木曜日 14:23:41 JST
->  starting parallel LAPW1 jobs at 2017年  6月 29日 木曜日 14:23:41 JST
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
 localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost(20) 20 total processes failed 
to start
0.0u 0.0s 0:00.20 10.0% 0+0k 8080+8io 23pf+0w
   Summary of lapw1para:
   localhost k=0 user=0  wallclock=0
0.0u 0.0s 0:02.10 0.9% 0+0k 8208+216io 24pf+0w
>   lapw2 -p(14:23:43) running LAPW2 in parallel mode
**  LAPW2 crashed!
0.0u 0.0s 0:00.07 28.5% 0+0k 32+104io 0pf+0w
error: command   /home/milkbar/WIEN2k_13/lapw2para lapw2.def   failed

>   stop error
--
Is my “serial” calculation actually processed over 24 CPUs already, so this is 
why it is faster than Case 2? Or am I doing something wrong? Why does Case 3 
crash? 


My second question is about PBS.
I installed torque PBS, and created a queue:

# create default queue
 qmgr -c 'create queue batch'
 qmgr -c 'set queue batch queue_type = execution'
 qmgr -c 'set queue batch started = true'
 qmgr -c 'set queue batch enabled = true'
 qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
 qmgr -c 'set queue batch resources_default.nodes = 1'
 qmgr -c 'set server default_queue = batch’

and followed other instructions on
https://jabriffa.wordpress.com/2015/02/11/installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/

The PBS system seems to work since I can submit very simple scripts and see 
them on qstat. My problem is that when I try to submit a serial wien2k job via 
PBS, it gives me an error (ultimately of course I’d like to submit them as 
parallel, but because of the ambiguity above I’ve kept it to serial) . Here's 
the PBS script and error message:

 #!/bin/tcsh
 ##PBS -A your_allocation
 # specify the allocation. Change it to your allocation
 #PBS -q batch
 #PBS -l nodes=1:ppn=20
 #PBS -l walltime=1:00:00
 #PBS -o wien2k_output
 #PBS -j oe
 #PBS -N wien2k_test
 cd $PBS_O_WORKDIR
 echo hello
 run_lapw -i 40 -ec .0001 -I

Error message (contents of wien2k_output):
hello
/var/spool/torque/mom_priv/jobs/44.milkbar-computer.kage.SC: line 12: run_lapw: 
command not found

The job is listed as complete in qstat, and the “hello” is written into the 
wien2k_output file. Changing the cd $PBS_O_WORKDIR to the path for the current 
case hasn’t changed anything. I can run run_lapw from the command line fine, 
though. Also, what do I write for allocation? (I commented it out, as I see 
other PBS scripts don’t always have this.)

I’ve also tried the parallel case, with the following PBS script. I set up the 
.structure file and do the initialization with w2web. I leave the “parallel 
calculation” option unchecked when setting up the case file in w2web.

 #!/bin/tcsh
 ##PBS -A your_allocation
 #PBS -q batch
 #PBS -l nodes=1:ppn=20
 #PBS -l walltime=1:00:00
 #
 #PBS -o wien2k_output
 #PBS -j oe
 #PBS -N wien2k_test
 cd $PBS_O_WORKDIR
 #
 #cat $PBS_NODEFILE |cut -c1-6 >.machines_currentdd
 #set aa=`wc .machines_current`
 #echo '#' > .machines
 #
 ##example for k-point parallel lapw1/2
 set i=1
while ($i <= $aa[1] )
echo -n '1:' >>.machines
head -$i 

Re: [Wien] Problem Regarding SLATER Calculations

2017-06-28 Thread apande
The STDOUT




hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
STOP  MIXER END
in cycle 2ETEST: 0   CTEST: 0
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
STOP  MIXER END
in cycle 3ETEST: 0   CTEST: 0
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 4ETEST: .02503978   CTEST: .182651
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 5ETEST: .081043035000   CTEST: .089823
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 6ETEST: .05105681   CTEST: .018457
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 7ETEST: .01306470   CTEST: .002347
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 8ETEST: .002592665000   CTEST: .000702
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 9ETEST: .00043674   CTEST: .28
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  MIXER END
in cycle 10ETEST: .000100545000   CTEST: .16
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW0 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LAPW1 END
STOP  LAPW2 END
Unmatched ".
STOP  LAPW2 END
Unmatched ".
STOP  CORE  END
STOP  CORE  END
Note: 

Re: [Wien] Problem Regarding SLATER Calculations

2017-06-28 Thread tran

Execute again run_vnonloc_lapw and redirect the output in STDOUT:
run_vnonloc_lapw -ec 0.0001 -cc 0.0001 -NI -p > & STDOUT &

Then, show us STDOUT.

On Wednesday 2017-06-28 19:25, apa...@iitk.ac.in wrote:


Date: Wed, 28 Jun 2017 19:25:48
From: apa...@iitk.ac.in
Reply-To: A Mailing list for WIEN2k users 
To: wien@zeus.theochem.tuwien.ac.at
Subject: Re: [Wien] Problem Regarding SLATER Calculations

Dear Users,

THE FOLLOWING FILES WERE NOT CREATED

case_SLATER_updated_1.scf
case_SLATER_fixed_1.scf
case_SLATER_updated_2.scf
case_SLATER_fixed_2.scf
case_SLATER_updated_3.scf
case_SLATER_fixed_3.scf
etc.


Note that the user guide says

The hf-module generates the file case.r2v_nonloc which contains the
Slater/SmBJ/KLI potential multiplied by the electron density. This file is
read by lapw0 in order to include the Slater/SmBJ/KLI potential into the
total potential

BUT THERE IS NO SUCH FILE CREATED IN THE DIRECTORY AFTER init_hf_lapw





Also if in STEP where case.in) is edited if I replace VX_SLATER by VX_KLI
, the band gap still comes out to be 10.813 eV  ie the same value . This
indicates that there is no change for the two which should not be correct.

Please Help



With Regards

Aditya Pande

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Problem Regarding SLATER Calculations

2017-06-28 Thread apande
Dear Users,

THE FOLLOWING FILES WERE NOT CREATED

case_SLATER_updated_1.scf
case_SLATER_fixed_1.scf
case_SLATER_updated_2.scf
case_SLATER_fixed_2.scf
case_SLATER_updated_3.scf
case_SLATER_fixed_3.scf
etc.


Note that the user guide says

The hf-module generates the file case.r2v_nonloc which contains the
Slater/SmBJ/KLI potential multiplied by the electron density. This file is
read by lapw0 in order to include the Slater/SmBJ/KLI potential into the
total potential

BUT THERE IS NO SUCH FILE CREATED IN THE DIRECTORY AFTER init_hf_lapw





Also if in STEP where case.in) is edited if I replace VX_SLATER by VX_KLI
, the band gap still comes out to be 10.813 eV  ie the same value . This
indicates that there is no change for the two which should not be correct.

Please Help



With Regards

Aditya Pande

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Problem Regarding SLATER Calculations

2017-06-28 Thread tran

Not this. If your calculation with run_vnonloc_lapw ran
properly, then the following scf files should have been created:

case_SLATER_updated_1.scf
case_SLATER_fixed_1.scf
case_SLATER_updated_2.scf
case_SLATER_fixed_2.scf
case_SLATER_updated_3.scf
case_SLATER_fixed_3.scf
etc.

Are there such files in the directory?

On Wednesday 2017-06-28 14:31, apa...@iitk.ac.in wrote:


Date: Wed, 28 Jun 2017 14:31:38
From: apa...@iitk.ac.in
Reply-To: A Mailing list for WIEN2k users 
To: wien@zeus.theochem.tuwien.ac.at
Subject: Re: [Wien] Problem Regarding SLATER Calculations

The following scf files were created in the directory:

case.scf
case.scf0
case.scfm
case.scf1dn
case.scf1up
case.scf2dn
case.scf2up
case.scfcdn
case.scfcup
case.scf0_grr

The band gap was read from case.scf2up or case.scf2dn (both had same values)

With Regards
Aditya Pande

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] Problem Regarding SLATER Calculations

2017-06-28 Thread apande
The following scf files were created in the directory:

case.scf
case.scf0
case.scfm
case.scf1dn
case.scf1up
case.scf2dn
case.scf2up
case.scfcdn
case.scfcup
case.scf0_grr

The band gap was read from case.scf2up or case.scf2dn (both had same values)

With Regards
Aditya Pande

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] The unit of -kbT option

2017-06-28 Thread Peter Blaha

It is in Ry.

So -kbt 0.004corresponds to 4 mRy.

On 06/28/2017 09:37 AM, Wien2k User wrote:

Dear Wien2k users;


The unit of -kbT "xxx" in NMR calculation is mRy or Ry.


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--

  P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


[Wien] The unit of -kbT option

2017-06-28 Thread Wien2k User
Dear Wien2k users;


The unit of -kbT "xxx" in NMR calculation is mRy or Ry.
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html