subject:"\[Wien\] Parallelization"

Re: [Wien] Parallelization and PBS on a single computer

2017-06-30 Thread Yoji Kobayashi

Dear Peter and Gavin,

Thank you for your help. Of course i went over the UG but your explanations 
cleared things up. I will be eventually doing supercell calculations on TiH and 
Ti surfaces so will look into the MPI errors in detail then. The PBS works fine 
now, with the #PBS -V command too. Many thanks again.

Yoji


> On Jun 29, 2017, at 14:49, Yoji Kobayashi  wrote:
> 
> Dear Users,
> 
> I have a some questions/problems regarding parallelization and PBS. 
> I’m not sure if I’m really running parallel vs. serial, and my PBS script 
> isn’t working.
> 
> ===
> My system info:
> Intel Xeon CPU E5-2630 v2 @2.6 GHz, 24 CPUS
> Memory: 32GB
> Running Wien2k_13, on Ubuntu 14.04.03
> File system: ext4
> (This is considered a single node with 24 processors?)
> ===
> My first question is, am I really running a parallel calculation in a 
> meaningful way?
> 
> What I try:
> In w2web, a serial calculation (SCF only)  for the TiC example  (500 k 
> points) takes about 25 sec. to converge.
> I do the same calculation (starting with a new case) but setting 
> parallelization in w2web, with slightly different .machine files for each 
> case:
> 
> Case 1:
> 1:localhost
> 
> Case 2 (i.e. 20 lines of below):
> 1:localhost
> 1:localhost
> …
> 1:localhost
> 1:localhost
> 
> Case 3
> 1:localhost:20
> 
> (no lines referring to granularity, etc for now)
> 
> What I get:
> Case 1 computes in about 54 sec;
> Case 2 computes in 1min23 sec.;
> Case 3 gives an error in running lapw2, see the dayfile below:
> -
> Calculating YK-016-TiC in /home/milkbar/Yoji/YK-016-TiC
> on milkbar-computer with PID 18077
> using WIEN2k_13.1 (Release 17/6/2013) in /home/milkbar/WIEN2k_13
> 
> 
> start (2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST) with lapw0 (40/99 
> to go)
> 
> cycle 1   (2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST)(40/99 to go)
> 
> >   lapw0 -p  (14:23:39) starting parallel lapw0 at 2017å¹´  6æœˆ 29æ—¥ 
> > æœ¨æ›œæ—¥ 14:23:39 JST
>  .machine0 : processors
> running lapw0 in single mode
> 1.7u 0.0s 0:01.84 98.3% 0+0k 16+440io 0pf+0w
> >   lapw1  -p (14:23:41) starting parallel lapw1 at 2017å¹´  6æœˆ 
> > 29æ—¥ æœ¨æ›œæ—¥ 14:23:41 JST
> ->  starting parallel LAPW1 jobs at 2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:41 JST
> running LAPW1 in parallel mode (using .machines)
> 1 number_of_parallel_jobs
>  localhost localhost localhost localhost localhost localhost localhost 
> localhost localhost localhost localhost localhost localhost localhost 
> localhost localhost localhost localhost localhost localhost(20) 20 total 
> processes failed to start
> 0.0u 0.0s 0:00.20 10.0% 0+0k 8080+8io 23pf+0w
>Summary of lapw1para:
>localhost   k=0 user=0  wallclock=0
> 0.0u 0.0s 0:02.10 0.9% 0+0k 8208+216io 24pf+0w
> >   lapw2 -p  (14:23:43) running LAPW2 in parallel mode
> **  LAPW2 crashed!
> 0.0u 0.0s 0:00.07 28.5% 0+0k 32+104io 0pf+0w
> error: command   /home/milkbar/WIEN2k_13/lapw2para lapw2.def   failed
> 
> >   stop error
> --
> Is my “serial” calculation actually processed over 24 CPUs already, so this 
> is why it is faster than Case 2? Or am I doing something wrong? Why does Case 
> 3 crash? 
> 
> 
> My second question is about PBS.
> I installed torque PBS, and created a queue:
> 
> # create default queue
>  qmgr -c 'create queue batch'
>  qmgr -c 'set queue batch queue_type = execution'
>  qmgr -c 'set queue batch started = true'
>  qmgr -c 'set queue batch enabled = true'
>  qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
>  qmgr -c 'set queue batch resources_default.nodes = 1'
>  qmgr -c 'set server default_queue = batch’
> 
> and followed other instructions on
> https://jabriffa.wordpress.com/2015/02/11/installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/
>  
> 
> 
> The PBS system seems to work since I can submit very simple scripts and see 
> them on qstat. My problem is that when I try to submit a serial wien2k job 
> via PBS, it gives me an error (ultimately of course I’d like to submit them 
> as parallel, but because of the ambiguity above I’ve kept it to serial) . 
> Here's the PBS script and error message:
> 
>  #!/bin/tcsh
>  ##PBS -A your_allocation
>  # specify the allocation. Change it to your allocation
>  #PBS -q batch
>  #PBS -l nodes=1:ppn=20
>  #PBS -l walltime=1:00:00
>  #PBS -o wien2k_output
>  #PBS -j oe
>  #PBS -N wien2k_test
>  cd $PBS_O_WORKDIR
>  echo hello
>  run_lapw -i 40 -ec .0001 -I
> 
> Error message (contents of wien2k_output):
> hello
> /var/spool/torque/mom_priv/jobs/44.milkbar-computer.kage.SC: line 12: 
> run_lapw: command not found
> 
> The job is listed as complete in qstat, and the “hello” is written into the 
> wien2k_output file. Changing the cd $PBS_O_WORKDIR to the path for the 
> current case hasn’t changed anything. I can run run_lapw from the command 
> line fine, though. Also, what do I

Re: [Wien] Parallelization and PBS on a single computer

2017-06-29 Thread Gavin Abo

/var/spool/torque/mom_priv/jobs/44.milkbar-computer.kage.SC: line 12: 
run_lapw: command not found

Perhaps the environmental variables need pushed out to all nodes, you 
might try adding the line #PBS -V [1,2] to your job submission script.

[1] http://www.nics.tennessee.edu/node/387
[2] http://www.open-mpi.org/community/lists/users/2008/10/6982.php

On 6/28/2017 11:49 PM, Yoji Kobayashi wrote:

Dear Users,

I have a some questions/problems regarding parallelization and PBS.
I’m not sure if I’m really running parallel vs. serial, and my PBS 
script isn’t working.

===
My system info:
Intel Xeon CPU E5-2630 v2 @2.6 GHz, 24 CPUS
Memory: 32GB
Running Wien2k_13, on Ubuntu 14.04.03
File system: ext4
(This is considered a single node with 24 processors?)
===
My first question is, am I really running a parallel calculation in a 
meaningful way?

What I try:
In w2web, a serial calculation (SCF only)  for the TiC example  (500 k 
points) takes about 25 sec. to converge.
I do the same calculation (starting with a new case) but setting 
parallelization in w2web, with slightly different .machine files for 
each case:

Case 1:
1:localhost

Case 2 (i.e. 20 lines of below):
1:localhost
1:localhost
…
1:localhost
1:localhost

Case 3
1:localhost:20

(no lines referring to granularity, etc for now)

What I get:
Case 1 computes in about 54 sec;
Case 2 computes in 1min23 sec.;
Case 3 gives an error in runninglapw2, see thedayfile below:
-
Calculating YK-016-TiC in /home/milkbar/Yoji/YK-016-TiC
on milkbar-computer with PID 18077
using WIEN2k_13.1 (Release 17/6/2013) in /home/milkbar/WIEN2k_13

 start  (2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST) with lapw0 (40/99 
to go)

 cycle 1(2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST)(40/99 to go)

>   lapw0 -p (14:23:39) starting parallel lapw0 at 2017å¹´  6æœˆ 29æ—¥ 
æœ¨æ›œæ—¥ 14:23:39 JST
 .machine0 : processors
running lapw0 in single mode
1.7u 0.0s 0:01.84 98.3% 0+0k 16+440io 0pf+0w
>   lapw1  -p(14:23:41) starting parallel lapw1 at 2017å¹´  6æœˆ 29æ—¥ 
æœ¨æ›œæ—¥ 14:23:41 JST
->  starting parallel LAPW1 jobs at 2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:41 JST
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
  localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost(20) 20 total processes failed 
to start
0.0u 0.0s 0:00.20 10.0% 0+0k 8080+8io 23pf+0w
Summary of lapw1para:
localhostk=0 user=0  wallclock=0
0.0u 0.0s 0:02.10 0.9% 0+0k 8208+216io 24pf+0w
>   lapw2 -p (14:23:43) running LAPW2 in parallel mode
**  LAPW2 crashed!
0.0u 0.0s 0:00.07 28.5% 0+0k 32+104io 0pf+0w
error: command   /home/milkbar/WIEN2k_13/lapw2para lapw2.def   failed

>   stop error
--
Is my “serial” calculation actually processed over 24 CPUs already, so this is 
why it is faster than Case 2? Or am I doing something wrong? Why does Case 3 
crash?

My second question is about PBS.
I installed torque PBS, and created a queue:

# create default queue
 qmgr -c 'create queue batch'
 qmgr -c 'set queue batch queue_type = execution'
 qmgr -c 'set queue batch started = true'
 qmgr -c 'set queue batch enabled = true'
 qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
 qmgr -c 'set queue batch resources_default.nodes = 1'
 qmgr -c 'set server default_queue = batch’

and followed other instructions on
https://jabriffa.wordpress.com/2015/02/11/installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/

The PBS system seems to work since I can submit very simple scripts 
and see them on qstat. My problem is that when I try to submit a 
serial wien2k job via PBS, it gives me an error (ultimately of course 
I’d like to submit them as parallel, but because of the ambiguity 
above I’ve kept it to serial) . Here's the PBS script and error message:

 #!/bin/tcsh
 ##PBS -A your_allocation
 # specify the allocation. Change it to your allocation
 #PBS -q batch
 #PBS -l nodes=1:ppn=20
 #PBS -l walltime=1:00:00
 #PBS -o wien2k_output
 #PBS -j oe
 #PBS -N wien2k_test
 cd $PBS_O_WORKDIR
 echo hello
 run_lapw -i 40 -ec .0001 -I

Error message (contents of wien2k_output):
hello
/var/spool/torque/mom_priv/jobs/44.milkbar-computer.kage.SC: line 12: 
run_lapw: command not found

The job is listed as complete in qstat, and the “hello” is written 
into thewien2k_output file. Changing the cd $PBS_O_WORKDIR to the path 
for the current case hasn’t changed anything. I can run run_lapwfrom 
the command line fine, though. Also, what do I write for allocation? 
(I commented it out, as I see other PBS scripts don’t always have this.)

I’ve also tried the parallel case, with the following PBS script. I 
set up the .structure file and do the initialization with w2web. I 
leave the “parallel calculation” option unchecked when setting up the 
case file in w2web.

 #!/bin/tcsh
 ##PBS -A

Re: [Wien] Parallelization and PBS on a single computer

2017-06-29 Thread Peter Blaha

I hope you did read the chapter about parallelization in the UG ??

Then you should know what the 3 cases actually do.

A few remarks:
case 2: This is k-point parallelization and you are running just 1
k-point in each lapw1 case. Now the time for one k-point is very short
(if it is standard TiC it should be below 0.1 sec/k). In this mode you
have to span 20 jobs (which are even delayed by DELAY seconds in
lapw1para_lapw) and this takes MUCH more time then the actual run time
of 20 k-points on a single core.
In essence: you cannot speedup with parallelization to an arbitrary
level, but you have to "think" or eventually test each case individually
until you get a feeling what the optimal number of cores is for your
present input. If the single core time is "nearly zero", parallelization
will not be faster, but in fact it will be SLOWER due to parallelization
overhead and this is what you observe.

PS: In addition: If one k-point on one core takes 10 seconds, if you run
20 such jobs in parallel, each single job will be MUCH slower. These
Intel multicore cpus are "memory-bus limited", i.e. Intel sells you
expensive cpus with 24 cores, but the memory bus can handle only much
less cores efficiently and in fact these many cores are useless for most
(memory bound applications) applications and everything slows down when
you try to use all of them.

case 3) this is mpi-fine grain parallelization. Basically here the same
thing is happening: Splitting up such a small matrix on 20 cores is very
inefficient and will run slower than a non-parallel run. It is mentioned
explicitly in the UG that you should use mpi-parallelization for cells
with more than 50 atoms.

Your tests will be VERY different, when you use a "big" case with a
larger unit cell.

Strategy:
Use larger cases for parallel tests.
Always monitor your tests with the "top" command, so that you can see
what happens.
Try to use "export OMP_NUMB_THREAD 2" (or 4 or 8) and check timings.
This uses 2 or 4 cores in all blas calls (large fraction of lapw1).

I don't know why your mpi-job crashes in lapw2. There must be more info ...

PBS error: obviously your PBS does not transfer the "environment".
When you type: run_lapw, the system finds this command because it is in
your PATH, which was defined in your .bashrc file.
The PBS job does not take over your environment. Probably you can fix
this by including "source ~/.bashrc" in the script.

On 06/29/2017 07:49 AM, Yoji Kobayashi wrote:

Dear Users,

I have a some questions/problems regarding parallelization and PBS.
I’m not sure if I’m really running parallel vs. serial, and my PBS
script isn’t working.

===
My system info:
Intel Xeon CPU E5-2630 v2 @2.6 GHz, 24 CPUS
Memory: 32GB
Running Wien2k_13, on Ubuntu 14.04.03
File system: ext4
(This is considered a single node with 24 processors?)
===
My first question is, am I really running a parallel calculation in a
meaningful way?

What I try:
In w2web, a serial calculation (SCF only) for the TiC example (500 k
points) takes about 25 sec. to converge.
I do the same calculation (starting with a new case) but setting
parallelization in w2web, with slightly different .machine files for
each case:

Case 1:
1:localhost

Case 2 (i.e. 20 lines of below):
1:localhost
1:localhost
…
1:localhost
1:localhost

Case 3
1:localhost:20

(no lines referring to granularity, etc for now)

What I get:
Case 1 computes in about 54 sec;
Case 2 computes in 1min23 sec.;
Case 3 gives an error in runninglapw2, see thedayfile below:
-
Calculating YK-016-TiC in /home/milkbar/Yoji/YK-016-TiC

on milkbar-computer with PID 18077
using WIEN2k_13.1 (Release 17/6/2013) in /home/milkbar/WIEN2k_13

start (2017å¹´ 6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST) with lapw0 (40/99
to go)

cycle 1 (2017å¹´ 6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST)(40/99 to go)

lapw0 -p (14:23:39) starting parallel lapw0 at 2017å¹´ 6æœˆ 29æ—¥
æœ¨æ›œæ—¥ 14:23:39 JST

.machine0 : processors
running lapw0 in single mode
1.7u 0.0s 0:01.84 98.3% 0+0k 16+440io 0pf+0w

lapw1 -p (14:23:41) starting parallel lapw1 at 2017å¹´ 6æœˆ 29æ—¥
æœ¨æ›œæ—¥ 14:23:41 JST

-> starting parallel LAPW1 jobs at 2017å¹´ 6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:41 JST
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
localhost localhost localhost localhost localhost localhost localhost
localhost localhost localhost localhost localhost localhost localhost localhost
localhost localhost localhost localhost localhost(20) 20 total processes failed
to start
0.0u 0.0s 0:00.20 10.0% 0+0k 8080+8io 23pf+0w
Summary of lapw1para:
localhost k=0 user=0 wallclock=0
0.0u 0.0s 0:02.10 0.9% 0+0k 8208+216io 24pf+0w

lapw2 -p (14:23:43) running LAPW2 in parallel mode

** LAPW2 crashed!
0.0u 0.0s 0:00.07 28.5% 0+0k 32+104io 0pf+0w
error: command /home/milkbar/WIEN2k_13/lapw2para lapw2.def failed

stop error

Is my “serial” calculation actually

[Wien] Parallelization and PBS on a single computer

2017-06-28 Thread Yoji Kobayashi

Dear Users,

I have a some questions/problems regarding parallelization and PBS. 
I’m not sure if I’m really running parallel vs. serial, and my PBS script isn’t 
working.

===
My system info:
Intel Xeon CPU E5-2630 v2 @2.6 GHz, 24 CPUS
Memory: 32GB
Running Wien2k_13, on Ubuntu 14.04.03
File system: ext4
(This is considered a single node with 24 processors?)
===
My first question is, am I really running a parallel calculation in a 
meaningful way?

What I try:
In w2web, a serial calculation (SCF only)  for the TiC example  (500 k points) 
takes about 25 sec. to converge.
I do the same calculation (starting with a new case) but setting 
parallelization in w2web, with slightly different .machine files for each case:

Case 1:
1:localhost

Case 2 (i.e. 20 lines of below):
1:localhost
1:localhost
…
1:localhost
1:localhost

Case 3
1:localhost:20

(no lines referring to granularity, etc for now)

What I get:
Case 1 computes in about 54 sec;
Case 2 computes in 1min23 sec.;
Case 3 gives an error in running lapw2, see the dayfile below:
-
Calculating YK-016-TiC in /home/milkbar/Yoji/YK-016-TiC
on milkbar-computer with PID 18077
using WIEN2k_13.1 (Release 17/6/2013) in /home/milkbar/WIEN2k_13


start   (2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST) with lapw0 (40/99 
to go)

cycle 1 (2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:39 JST)(40/99 to go)

>   lapw0 -p(14:23:39) starting parallel lapw0 at 2017å¹´  6æœˆ 29æ—¥ 
> æœ¨æ›œæ—¥ 14:23:39 JST
 .machine0 : processors
running lapw0 in single mode
1.7u 0.0s 0:01.84 98.3% 0+0k 16+440io 0pf+0w
>   lapw1  -p   (14:23:41) starting parallel lapw1 at 2017å¹´  6æœˆ 
> 29æ—¥ æœ¨æ›œæ—¥ 14:23:41 JST
->  starting parallel LAPW1 jobs at 2017å¹´  6æœˆ 29æ—¥ æœ¨æ›œæ—¥ 14:23:41 JST
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
 localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost(20) 20 total processes failed 
to start
0.0u 0.0s 0:00.20 10.0% 0+0k 8080+8io 23pf+0w
   Summary of lapw1para:
   localhost k=0 user=0  wallclock=0
0.0u 0.0s 0:02.10 0.9% 0+0k 8208+216io 24pf+0w
>   lapw2 -p(14:23:43) running LAPW2 in parallel mode
**  LAPW2 crashed!
0.0u 0.0s 0:00.07 28.5% 0+0k 32+104io 0pf+0w
error: command   /home/milkbar/WIEN2k_13/lapw2para lapw2.def   failed

>   stop error
--
Is my “serial” calculation actually processed over 24 CPUs already, so this is 
why it is faster than Case 2? Or am I doing something wrong? Why does Case 3 
crash? 


My second question is about PBS.
I installed torque PBS, and created a queue:

# create default queue
 qmgr -c 'create queue batch'
 qmgr -c 'set queue batch queue_type = execution'
 qmgr -c 'set queue batch started = true'
 qmgr -c 'set queue batch enabled = true'
 qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
 qmgr -c 'set queue batch resources_default.nodes = 1'
 qmgr -c 'set server default_queue = batch’

and followed other instructions on
https://jabriffa.wordpress.com/2015/02/11/installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/

The PBS system seems to work since I can submit very simple scripts and see 
them on qstat. My problem is that when I try to submit a serial wien2k job via 
PBS, it gives me an error (ultimately of course I’d like to submit them as 
parallel, but because of the ambiguity above I’ve kept it to serial) . Here's 
the PBS script and error message:

 #!/bin/tcsh
 ##PBS -A your_allocation
 # specify the allocation. Change it to your allocation
 #PBS -q batch
 #PBS -l nodes=1:ppn=20
 #PBS -l walltime=1:00:00
 #PBS -o wien2k_output
 #PBS -j oe
 #PBS -N wien2k_test
 cd $PBS_O_WORKDIR
 echo hello
 run_lapw -i 40 -ec .0001 -I

Error message (contents of wien2k_output):
hello
/var/spool/torque/mom_priv/jobs/44.milkbar-computer.kage.SC: line 12: run_lapw: 
command not found

The job is listed as complete in qstat, and the “hello” is written into the 
wien2k_output file. Changing the cd $PBS_O_WORKDIR to the path for the current 
case hasn’t changed anything. I can run run_lapw from the command line fine, 
though. Also, what do I write for allocation? (I commented it out, as I see 
other PBS scripts don’t always have this.)

I’ve also tried the parallel case, with the following PBS script. I set up the 
.structure file and do the initialization with w2web. I leave the “parallel 
calculation” option unchecked when setting up the case file in w2web.

 #!/bin/tcsh
 ##PBS -A your_allocation
 #PBS -q batch
 #PBS -l nodes=1:ppn=20
 #PBS -l walltime=1:00:00
 #
 #PBS -o wien2k_output
 #PBS -j oe
 #PBS -N wien2k_test
 cd $PBS_O_WORKDIR
 #
 #cat $PBS_NODEFILE |cut -c1-6 >.machines_currentdd
 #set aa=`wc .machines_current`
 #echo '#' > .machines
 #
 ##example for k-point parallel lapw1/2
 set i=1
while ($i <= $aa[1] )
echo -n '1:' >>.machines
head -$i

[Wien] Parallelization on one computer with 8 cpus

2014-03-17 Thread Juliet parker

Hi
dear WIEN2k users
I want to run wien2k
in the parallel mode (i.e. run_lapw -p -cc 0.0001) on only one computer which
has 8 CPUs, i.e. all 8 CPUs be used.
I have read
user-guide (section 5.5.3), but it was not explained “Parallelization” on only
one computer.
The default setting
of “.machines” file for my system is as follows:
 
1:localhost
1:localhost
granularity:1
extrafine:1
 
As I have understood
from user-guide, I should edit this file as follows (I have 74 k-points):
-
granularity:1
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
residue: localhost:2
lapw0: localhost:8 
dstart: localhost:8
 # 9kpoints * 8cups =72
kpoints from 74 kpoints, so 2 additional k-points were assigned to 
“residue:localhost:2”
 
OR as:
 
granularity:1
74:localhost:8
lapw0: localhost:8 
dstart: localhost:8
 
 
Are such “.machine”
files true?
Could you please help
me with this problem?
Thank you
J.Parker___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Parallelization on one computer with 8 cpus

2014-03-17 Thread Peter Blaha


Please read the UG more carefully.
I think parallelization is explained pretty well.
In particular you have to understand the difference between k-parallel 
and fine-grane mpi-parallel mode.


The syntax for lapw1/2 parallelization is:

speed:hostname[:cores for mpi] [other_hostnames_for_mpi]

So the first number is a relative speed number. Useful if you want to 
combine a fast and a slow computer in k-parallel mode. For you: always 1


second ist hostname, in your case it can be localhost

:8  would mean 8 fold mpi-parallel. It requires that you have installed 
properly the mpi-version (this is a bit more difficult) AND that you 
have a bigger case which runs reasonable in mpi-mode. If you need 74 
k-points, most likely this is NOT the case.


I'd set OMP_NUM_THREAD=2 and then use:

granularity:1
1:localhost
1:localhost
1:localhost
1:localhost
residue: localhost
#  only if you have mpi installed !!
lapw0: localhost:8
dstart: localhost:8

This will start 4 jobs with 18k and a residue job with 2 k.
Each will use 2 cores because of OMP_NUM_THREADS when wien2k is 
installed properly.


On 03/16/2014 03:12 PM, Juliet parker wrote:

*Hi dear WIEN2k users*
I want to run wien2k in the parallel mode (i.e. run_lapw -p -cc 0.0001)
on only one computer which has 8 CPUs, /i.e./ all 8 CPUsbe used.
I have read user-guide (section 5.5.3), but it was not explained
“Parallelization” on only one computer.
The default setting of “.machines” file for my system is as follows:
1:localhost
1:localhost
granularity:1
extrafine:1
As I have understood from user-guide, I should edit this file as follows
(I have 74 k-points):
-
granularity:1
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
9:localhost
residue: localhost:2
lapw0: localhost:8
dstart: localhost:8
# 9kpoints * 8cups =72 kpoints from 74 kpoints, so 2 additional k-points
were assigned to “residue:localhost:2”
OR as:
granularity:1
74:localhost:8
lapw0: localhost:8
dstart: localhost:8
Are such “.machine” files true?
Could you please help me with this problem?
Thank you
J.Parker


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--

  P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWWW: 
http://info.tuwien.ac.at/theochem/

--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Parallelization and PBS on a single computer

Re: [Wien] Parallelization and PBS on a single computer

Re: [Wien] Parallelization and PBS on a single computer

[Wien] Parallelization and PBS on a single computer

[Wien] Parallelization on one computer with 8 cpus

Re: [Wien] Parallelization on one computer with 8 cpus

6 matches

Site Navigation

Mail list logo

Footer information