[Wien] Pbnma and Pna2_1 SG

2017-06-30 Thread Dr. K. C. Bhamu
Dear Users,

I am doing a calculation of an orthorhombic complex perovskite. I want to
preserve my a~b___
Wien mailing list

Re: [Wien] Parallelization and PBS on a single computer

2017-06-30 Thread Yoji Kobayashi
Dear Peter and Gavin,

Thank you for your help. Of course i went over the UG but your explanations 
cleared things up. I will be eventually doing supercell calculations on TiH and 
Ti surfaces so will look into the MPI errors in detail then. The PBS works fine 
now, with the #PBS -V command too. Many thanks again.


> On Jun 29, 2017, at 14:49, Yoji Kobayashi  wrote:
> Dear Users,
> I have a some questions/problems regarding parallelization and PBS. 
> I’m not sure if I’m really running parallel vs. serial, and my PBS script 
> isn’t working.
> ===
> My system info:
> Intel Xeon CPU E5-2630 v2 @2.6 GHz, 24 CPUS
> Memory: 32GB
> Running Wien2k_13, on Ubuntu 14.04.03
> File system: ext4
> (This is considered a single node with 24 processors?)
> ===
> My first question is, am I really running a parallel calculation in a 
> meaningful way?
> What I try:
> In w2web, a serial calculation (SCF only)  for the TiC example  (500 k 
> points) takes about 25 sec. to converge.
> I do the same calculation (starting with a new case) but setting 
> parallelization in w2web, with slightly different .machine files for each 
> case:
> Case 1:
> 1:localhost
> Case 2 (i.e. 20 lines of below):
> 1:localhost
> 1:localhost
> …
> 1:localhost
> 1:localhost
> Case 3
> 1:localhost:20
> (no lines referring to granularity, etc for now)
> What I get:
> Case 1 computes in about 54 sec;
> Case 2 computes in 1min23 sec.;
> Case 3 gives an error in running lapw2, see the dayfile below:
> -
> Calculating YK-016-TiC in /home/milkbar/Yoji/YK-016-TiC
> on milkbar-computer with PID 18077
> using WIEN2k_13.1 (Release 17/6/2013) in /home/milkbar/WIEN2k_13
> start (2017年  6月 29日 木曜日 14:23:39 JST) with lapw0 (40/99 
> to go)
> cycle 1   (2017年  6月 29日 木曜日 14:23:39 JST)(40/99 to go)
> >   lapw0 -p  (14:23:39) starting parallel lapw0 at 2017年  6月 29日 
> > 木曜日 14:23:39 JST
>  .machine0 : processors
> running lapw0 in single mode
> 1.7u 0.0s 0:01.84 98.3% 0+0k 16+440io 0pf+0w
> >   lapw1  -p (14:23:41) starting parallel lapw1 at 2017年  6月 
> > 29日 木曜日 14:23:41 JST
> ->  starting parallel LAPW1 jobs at 2017年  6月 29日 木曜日 14:23:41 JST
> running LAPW1 in parallel mode (using .machines)
> 1 number_of_parallel_jobs
>  localhost localhost localhost localhost localhost localhost localhost 
> localhost localhost localhost localhost localhost localhost localhost 
> localhost localhost localhost localhost localhost localhost(20) 20 total 
> processes failed to start
> 0.0u 0.0s 0:00.20 10.0% 0+0k 8080+8io 23pf+0w
>Summary of lapw1para:
>localhost   k=0 user=0  wallclock=0
> 0.0u 0.0s 0:02.10 0.9% 0+0k 8208+216io 24pf+0w
> >   lapw2 -p  (14:23:43) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.0u 0.0s 0:00.07 28.5% 0+0k 32+104io 0pf+0w
> error: command   /home/milkbar/WIEN2k_13/lapw2para lapw2.def   failed
> >   stop error
> --
> Is my “serial” calculation actually processed over 24 CPUs already, so this 
> is why it is faster than Case 2? Or am I doing something wrong? Why does Case 
> 3 crash? 
> My second question is about PBS.
> I installed torque PBS, and created a queue:
> # create default queue
>  qmgr -c 'create queue batch'
>  qmgr -c 'set queue batch queue_type = execution'
>  qmgr -c 'set queue batch started = true'
>  qmgr -c 'set queue batch enabled = true'
>  qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
>  qmgr -c 'set queue batch resources_default.nodes = 1'
>  qmgr -c 'set server default_queue = batch’
> and followed other instructions on
> https://jabriffa.wordpress.com/2015/02/11/installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/
> The PBS system seems to work since I can submit very simple scripts and see 
> them on qstat. My problem is that when I try to submit a serial wien2k job 
> via PBS, it gives me an error (ultimately of course I’d like to submit them 
> as parallel, but because of the ambiguity above I’ve kept it to serial) . 
> Here's the PBS script and error message:
>  #!/bin/tcsh
>  ##PBS -A your_allocation
>  # specify the allocation. Change it to your allocation
>  #PBS -q batch
>  #PBS -l nodes=1:ppn=20
>  #PBS -l walltime=1:00:00
>  #PBS -o wien2k_output
>  #PBS -j oe
>  #PBS -N wien2k_test
>  echo hello
>  run_lapw -i 40 -ec .0001 -I
> Error message (contents of wien2k_output):
> hello
> /var/spool/torque/mom_priv/jobs/44.milkbar-computer.kage.SC: line 12: 
> run_lapw: command not found
> The job is listed as complete in qstat, and the “hello” is written into the 
> wien2k_output file. Changing the cd $PBS_O_WORKDIR to the path for the 
> current case hasn’t changed anything. I can run run_lapw from the command 
> line fine, though. Also, what do I