Hi all, We have updated our torque version to 2.5.8 recently, but, as I see this is a maui issue, I first ask here.
our combo is : # rpm -qa|egrep 'maui-server|torque-server' maui-server-3.3-1.x86_64 torque-server-2.5.8-1.cri.x86_64 Maui works fine, but in a schedule cycle, if it finds a node in busy status, it does not schedule any other job in that cyle: 09/14 16:20:48 INFO: job '20420500' successfully started 09/14 16:20:48 MRMJobStart(20420265,Msg,SC) 09/14 16:20:48 MPBSJobStart(20420265,base,Msg,SC) 09/14 16:20:48 ERROR: job '20420265' cannot be started: (rc: 15046 errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)' hostlist: 'td578.pic.es') 09/14 16:20:48 ERROR: cannot start job '20420265' in partition DEFAULT 09/14 16:20:48 MJobPReserve(20420265,DEFAULT,ResCount,ResCountRej) 09/14 16:20:48 INFO: no priority reservations created (bf/rsv policy) 09/14 16:20:48 MRMJobStart(20420306,Msg,SC) 09/14 16:20:48 MPBSJobStart(20420306,base,Msg,SC) 09/14 16:20:48 ERROR: job '20420306' cannot be started: (rc: 15046 errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)' hostlist: 'td578.pic.es') 09/14 16:20:48 ERROR: cannot start job '20420306' in partition DEFAULT 09/14 16:20:48 MJobPReserve(20420306,DEFAULT,ResCount,ResCountRej) 09/14 16:20:48 INFO: no priority reservations created (bf/rsv policy) 09/14 16:20:48 MRMJobStart(20420268,Msg,SC) torque says that the node is busy: 09/14/2011 03:00:13;0008;PBS_Server;Job;20401629.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy) 09/14/2011 03:00:13;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)), aux=0, type=RunJob, from [email protected] 09/14/2011 03:00:13;0008;PBS_Server;Job;20401630.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy) but that the node is not real "busy". It's only busy for few seconds becasue, after I see the error (delay of 3-4 seconds), I do a pbsnodes $nodename and I see it free. On the next scheduling cycle, if it does not find any "busy" node, all jobs are scheduled. I'm wondering if I could configure maui to bypass those failing nodes and keep scheduling other jobs while I guess why torque mark those nodes as busy if they are not. TIA, Arnau _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
