it  crashed with the message  "Host key verification failed. "

Seems that your cluster does not allow   ssh to an allocated node.(Ask your sys admin).

In $WIENROOT/WIEN2k_parallel_options  there are variables like

USE_REMOTE.  If set to zero, ssh is not used and you can run in parallel, but only on one shared memory node.

In order to use multiple nodes, you need to be able to do passwordless ssh to the allocated nodes (or any other command substituting ssh).


Herethe content of file /lustre/ukt/milias/scratch/Wien2k_23.2_job.main.N1.n4.jid3009460/LvO2onQg/.machines:
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177

Job is running on lxbk1177, with 8 cpus allocated;

and this is from log :

running x dstart :
starting parallel dstart at Tue 20 Jun 2023 05:16:21 PM CEST
-------- .machine0 : processors
running dstart in single mode
STOP DSTART ENDS
10.249u 0.322s 0:11.19 94.3%    0+0k 158496+101160io 437pf+0w

running 'run_lapw -p -ec 0.0001 -NI'
STOP  LAPW0 END
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerr or_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > . temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]  + Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1]    Done                          ( ( $remote $machine[$p] "cd $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
LvO2onQg.scf1_1: No such file or directory.
grep: *scf1*: No such file or directory
STOP FERMI - Error
cp: cannot stat '.in.tmp': No such file or directory
grep: *scf1*: No such file or directory

>   stop error



file ":parallel"

starting parallel lapw1 at Tue 20 Jun 2023 05:17:08 PM CEST
    lxbk1177(4)      lxbk1177(3)      lxbk1177(3)      lxbk1177(3)      lxbk1177(3)      lxbk1177(3)      lxbk1177(3)      l
xbk1177(3)    Summary of lapw1para:
  lxbk1177      k=25    user=0  wallclock=0
<-  done at Tue 20 Jun 2023 05:17:14 PM CEST
-----------------------------------------------------------------
->  starting Fermi on lxbk1177 at Tue 20 Jun 2023 05:17:15 PM CEST
**  LAPW2 crashed at Tue 20 Jun 2023 05:17:16 PM CEST
**  check ERROR FILES!
-----------------------------------------------------------------





_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST 
at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

--
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email:peter.bl...@tuwien.ac.at WWW:http://www.imc.tuwien.ac.at WIEN2k:http://www.wien2k.at
-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to