please post your configuration file of maui and your torque setup


On 9/14/2011 10:32 AM, Arnau Bria wrote:
Hi all,

We have updated our torque version to 2.5.8 recently, but, as I see
this is a maui issue, I first ask here.

our combo is :
# rpm -qa|egrep 'maui-server|torque-server'
maui-server-3.3-1.x86_64
torque-server-2.5.8-1.cri.x86_64

Maui works fine, but in a schedule cycle, if it finds a node in busy
status, it does not schedule any other job in that cyle:

09/14 16:20:48 INFO:     job '20420500' successfully started
09/14 16:20:48 MRMJobStart(20420265,Msg,SC)
09/14 16:20:48 MPBSJobStart(20420265,base,Msg,SC)
09/14 16:20:48 ERROR:    job '20420265' cannot be started: (rc: 15046  errmsg: 
'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 
'td578.pic.es' to job - node not currently available (state: busy)'  hostlist: 
'td578.pic.es')
09/14 16:20:48 ERROR:    cannot start job '20420265' in partition DEFAULT
09/14 16:20:48 MJobPReserve(20420265,DEFAULT,ResCount,ResCountRej)
09/14 16:20:48 INFO:     no priority reservations created (bf/rsv policy)
09/14 16:20:48 MRMJobStart(20420306,Msg,SC)
09/14 16:20:48 MPBSJobStart(20420306,base,Msg,SC)
09/14 16:20:48 ERROR:    job '20420306' cannot be started: (rc: 15046  errmsg: 
'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 
'td578.pic.es' to job - node not currently available (state: busy)'  hostlist: 
'td578.pic.es')
09/14 16:20:48 ERROR:    cannot start job '20420306' in partition DEFAULT
09/14 16:20:48 MJobPReserve(20420306,DEFAULT,ResCount,ResCountRej)
09/14 16:20:48 INFO:     no priority reservations created (bf/rsv policy)
09/14 16:20:48 MRMJobStart(20420268,Msg,SC)



torque says that the node is busy:

09/14/2011 03:00:13;0008;PBS_Server;Job;20401629.pbs03.pic.es;could not locate 
requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 
'td578.pic.es' to job - node not currently available (state: busy)
09/14/2011 03:00:13;0080;PBS_Server;Req;req_reject;Reject reply 
code=15046(Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot 
allocate node 'td578.pic.es' to job - node not currently available (state: 
busy)), aux=0, type=RunJob, from [email protected]
09/14/2011 03:00:13;0008;PBS_Server;Job;20401630.pbs03.pic.es;could not locate 
requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 
'td578.pic.es' to job - node not currently available (state: busy)


but that the node is not real "busy". It's only busy for
few seconds becasue, after I see the error (delay of 3-4 seconds), I do a
pbsnodes $nodename and I see it free.
On the next scheduling cycle, if it does not find any "busy" node, all jobs are 
scheduled.


I'm wondering if I could configure maui to bypass those failing nodes
and keep scheduling other jobs while I guess why torque mark those
nodes as busy if they are not.


TIA,
Arnau
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

<<attachment: laotsao.vcf>>

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to