Dear All,
We are facing a strange problem with Rocks 5.3 + Torque( 2.4.6) / Maui
(3.2.6p21) installation.
Some nodes are shown in state Down by maui
[root@norma ~]# checknode compute-0-141
checking node compute-0-141
State: Down (in current state for 00:00:00)
Configured Resources: PROCS: 8 MEM: 15G SWAP: 16G DISK: 1M
Utilized Resources: PROCS: 8
Dedicated Resources: [NONE]
Opsys: linux Arch: [NONE]
Speed: 1.00 Load: 0.120
Network: [DEFAULT]
Features: [NONE]
Attributes: [Batch]
Classes: [default 8:8]
Total Time: INFINITY Up: 46:00:40:56 (32.37%) Active: 24:21:46:44 (17.51%)
Reservations:
NOTE: no reservations on node
while pbsnodes, Torque's monitoring utility shows it in free state
[root@norma ~]# pbsnodes compute-0-141
compute-0-141
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux compute-0-141.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=109343,totmem=17458824kb,availmem=17312400kb,physmem=16438708kb,ncpus=8,loadave=0.06,message=ERROR:
prolog/epilog failed,file:
/opt/torque/mom_priv/epilogue.parallel,exit: 1,nonzero p/e exit
status,netload=21338837309,state=free,jobs=,varattr=,rectime=1298041707
the nodes are running pbs_mom service
[root@norma ~]# ssh compute-0-141 service pbs status
pbs_mom is pid 3876
Any help with diagnosis is highly appreciated. Kindly help !
-Sudarshan Wadkar
System Administrator
High Performance Computing Center
IIT Bombay, Powai, Mumbai 76
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers