subject:"\[Wien\] problem in parallel mode calculation"

Re: [Wien] problem in parallel mode calculation

2017-03-13 Thread Gavin Abo

The .machines file looks fine to me, but one of the others might see 
something that I didn't notice (besides the WIEN2k command not being 
there at the bottom of the file - likely missed in the copy and paste).


The main problem seems to the "bash: lapw1: command not found" unless 
something happened earlier that is not shown.  Tracking down parallel 
error messages is more complicated.  Unlike a serial calculation that 
can output the standard output and error to the display of a terminal on 
a desktop, a parallel calculation on a cluster with a queue system can 
put them in a standard output (-o) and standard error file (-e) or a 
combined output/error file (-j) with user specified name(s) [1,2].  They 
can also be written to the hidden dot files like .time* or .stdout* as 
mentioned before [3,4,5].


The "lapw1: command not found" might be because $WIENROOT didn't get 
added to the PATH on one of the nodes [ 
http://www.supercluster.org/pipermail/torqueusers/2010-March/010143.html 
].  Did you try checking if the path to WIEN2k is in the PATH, such as 
PBS_O_PATH with qstat -f jobid [ 
http://stackoverflow.com/questions/21248406/sleep-command-not-found-in-torque-pbs-but-works-in-shell 
].


Did you try to ssh into all 8 nodes and see if you can see lapw1 on each 
node?  For example,


ssh n024
ls -l $WIENROOT/lapw1

ssh n225
ls -l $WIENROOT/lapw1

...

Above, I'm just guessing about the commands/configuration for your 
system, but the administrator or helpdesk for your cluster should know 
everything about your system and be able to help you much better with 
resolving the command not found error.


[1] http://beige.ucs.indiana.edu/I590/node39.html
[2] 
https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub
[3] 
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13598.html
[4] 
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14148.html

[5] http://zeus.theochem.tuwien.ac.at/pipermail/wien/2017-March/026109.html

On 3/13/2017 1:25 PM, shaymlal dayananda wrote:

Dear developers and users

I was trying to do a volume optimization and scf calculation with spin 
polarization in parallel mode. But my both the jobs crashes and I got 
the following error file. However both cases run correctly when 
parallel mode is removed.


'LAPW2' - can't open unit: 30
 'LAPW2' -filename: case.energyup_1
**  testerror: Error in Parallel LAPW2
.
Also in STDOUT , I see the following particular errors. (

...
bash: lapw1: command not found
...

.
FERMI - Error
grep: *scf1dn*: No such file or directory
0.381u 0.507s 1:12.66 1.2%0+0k 128+1736io 1pf+0w
Test-TiC-VOl-parallel.scf1dn_1: No such file or directory.
.


I copied my machine file and the job file here. But I think this is 
not correct and I am not sure whether I needs to have lines for lapw2 
and lapwsp separately. Any help to get corrected this is highly 
appreciated.


".machnes" file
.
#
lapw0:n024  n225  n220  n218  n045  n044  n043  n043
1:n024
1:n225
1:n220
1:n218
1:n045
1:n044
1:n043
1:n043
granularity:1
extrafine:1

..

job file is copied below.


# example for 8 nodes
#PBS -l procs=8
#PBS -l pmem=2048mb
#PBS -l walltime=4:00:00

module load wien2k

# change into your working directory
cd $PBS_O_WORKDIR
#start creating .machines
cat $PBS_NODEFILE |cut -c1-6 >.machines_current
aa=`cat .machines_current | wc -l`
echo '#' > .machines

# example for an MPI parallel lapw0
echo -n 'lapw0:' >> .machines
i=1
while [ $i -lt $aa ]
do
echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >>.machines
i=$((i+1))
done
echo  `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >>.machines

#example for k-point parallel lapw1/2
i=1
while [ $i -le $aa ]
do
echo -n '1:' >>.machines
head -$i .machines_current |tail -1 >> .machines
i=$((i+1))
done

echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines

#define here your WIEN2k command





Thank you

Chami
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

[Wien] problem in parallel mode calculation

2017-03-13 Thread shaymlal dayananda

Dear developers and users
I was trying to do a volume optimization and scf calculation with spin 
polarization in parallel mode. But my both the jobs crashes and I got the 
following error file. However both cases run correctly when parallel mode is 
removed.

'LAPW2' - can't open unit: 30    
 'LAPW2' -    filename: case.energyup_1 
**  testerror: Error in Parallel 
LAPW2.
Also in STDOUT , I see the following particular errors. (

...
bash: lapw1: command not found...
.
FERMI - Error
grep: *scf1dn*: No such file or directory
0.381u 0.507s 1:12.66 1.2%    0+0k 128+1736io 1pf+0w
Test-TiC-VOl-parallel.scf1dn_1: No such file or 
directory..

I copied my machine file and the job file here. But I think this is not correct 
and I am not sure whether I needs to have lines for lapw2 and lapwsp 
separately. Any help to get corrected this is highly appreciated. 

".machnes" file.#
lapw0:n024  n225  n220  n218  n045  n044  n043  n043  
1:n024
1:n225
1:n220
1:n218
1:n045
1:n044
1:n043
1:n043
granularity:1
extrafine:1
..
job file is copied below.

# example for 8 nodes
#PBS -l procs=8
#PBS -l pmem=2048mb
#PBS -l walltime=4:00:00 

module load wien2k

# change into your working directory
cd $PBS_O_WORKDIR
#start creating .machines
cat $PBS_NODEFILE |cut -c1-6 >.machines_current
aa=`cat .machines_current | wc -l`
echo '#' > .machines

# example for an MPI parallel lapw0 
echo -n 'lapw0:' >> .machines
i=1
while [ $i -lt $aa ]
do
echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >>.machines
i=$((i+1))
done
echo  `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >>.machines

#example for k-point parallel lapw1/2
i=1
while [ $i -le $aa ]
do
echo -n '1:' >>.machines
head -$i .machines_current |tail -1 >> .machines
i=$((i+1))
done

echo 'granularity:1' >>.machines
echo 'extrafine:1' >>.machines

#define here your WIEN2k command




Thank you
Chami





___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] problem in parallel mode calculation

[Wien] problem in parallel mode calculation

2 matches

Site Navigation

Mail list logo

Footer information