Dear All,
We are facing a strange problem with Rocks 5.3 + Torque( 2.4.6) / Maui
(3.2.6p21) installation.
Some nodes are shown in state Down by maui

[root@norma ~]# checknode compute-0-141
checking node compute-0-141
State:      Down  (in current state for 00:00:00)
Configured Resources: PROCS: 8  MEM: 15G  SWAP: 16G  DISK: 1M
Utilized   Resources: PROCS: 8
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       0.120
Network:    [DEFAULT]
Features:   [NONE]
Attributes: [Batch]
Classes:    [default 8:8]

Total Time:   INFINITY  Up: 46:00:40:56 (32.37%)  Active: 24:21:46:44 (17.51%)

Reservations:
NOTE:  no reservations on node

while pbsnodes, Torque's monitoring utility shows it in free state
[root@norma ~]# pbsnodes compute-0-141
compute-0-141
     state = free
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux compute-0-141.local
2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009
x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=109343,totmem=17458824kb,availmem=17312400kb,physmem=16438708kb,ncpus=8,loadave=0.06,message=ERROR:
prolog/epilog failed,file:
/opt/torque/mom_priv/epilogue.parallel,exit: 1,nonzero p/e exit
status,netload=21338837309,state=free,jobs=,varattr=,rectime=1298041707

the nodes are running pbs_mom service

[root@norma ~]# ssh compute-0-141 service pbs status
pbs_mom is pid 3876

Any help with diagnosis is highly appreciated. Kindly help !

-Sudarshan Wadkar
System Administrator
High Performance Computing Center
IIT Bombay, Powai,  Mumbai 76
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to