[Mauiusers] trouble with classlist

Jon Wright Mon, 16 Mar 2009 20:34:25 -0700

Hi,

We have multiple sets of nodes and queues and as far as possible try topush jobs from one queue to a certain set of nodes first and if thoseare all busy to another set.


queue parallel -> d,f,k nodes (10 of each, total 30)
queue medium64 -> a and l nodes + temp1 (total of 25)

In the past we have used the SRCFG for this as below:

# Tie E4400, E2160 and E6600 machines to the medium64 queue

SRCFG[medium64]HOSTLIST=a01,a02,a03,a04,a05,a06,a07,a08,a09,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,l01,l02,l03,l04,temp1

SRCFG[medium64]    CLASSLIST=medium64
SRCFG[medium64]    PERIOD=INFINITY
SRCFG[medium64]    RESOURCES=PROCS:-1

# Tie the parallel queue to the quad core phenoms cpus

SRCFG[parallel]HOSTLIST=f01,f02,f03,f04,f05,f06,f07,f08,f09,f10,d01,d02,d03,d04,d05,d06,d07,d08,d09,d10,k01,k02,k03,k04,k05,k06,k07,

k08,k09,k10
SRCFG[parallel]    CLASSLIST=parallel,medium64-
SRCFG[parallel]    PERIOD=INFINITY
SRCFG[parallel]    RESOURCES=PROCS:-1

However what is now happening is that (maui-3.2.21) any jobs submittedto the medium64 queue are always sent to the f,d or k nodes first andnot to the a machines.

in fact when considering the nodes maui does not even consider the amachines to be available:

03/17 11:01:40 INFO:     processing node request line '1:ppn=1'

03/17 11:01:40 INFO: job '343129' loaded: 1 jon staff1209600 Idle 0 1237258899 [NONE] [NONE] [NONE] >= 0 >

=      0 [NONE] 1237258900
03/17 11:01:40 INFO:     15 PBS jobs detected on RM vanguard
03/17 11:01:40 INFO:     jobs detected: 15
03/17 11:01:40 INFO:     total jobs selected (ALL): 1/15 [State: 14]
03/17 11:01:40 INFO:     total jobs selected (ALL): 1/15 [State: 14]
03/17 11:01:40 INFO:     total jobs selected in partition ALL: 1/1
03/17 11:01:40 MQueueScheduleRJobs(Q)
03/17 11:01:40 INFO:     total jobs selected in partition ALL: 1/1
03/17 11:01:40 INFO:     total jobs selected in partition DEFAULT: 1/1
03/17 11:01:40 MQueueScheduleIJobs(Q,DEFAULT)

03/17 11:01:40 INFO: 370 feasible tasks found for job 343129:0 inpartition DEFAULT (1 Needed)03/17 11:01:40 INFO: tasks located for job 343129: 1 of 1 required(120 feasible)

03/17 11:01:40 MJobStart(343129)
03/17 11:01:40 MRMJobStart(343129,Msg,SC)
03/17 11:01:40 MPBSJobStart(343129,vanguard,Msg,SC)
03/17 11:01:40 MPBSJobModify(343129,Resource_List,Resource,k10)
03/17 11:01:40 MPBSJobModify(343129,Resource_List,Resource,1:ppn=1)
03/17 11:01:40 INFO:     job '343129' successfully started
03/17 11:01:40 INFO:     starting job '343129'
03/17 11:01:40 INFO:     1 jobs started on iteration 2
Active Jobs------

The 120 feasible indicated that the a machines are not being consideredbecause 120 is the number of cpu's available from the 30 d,k,f machines.

Now this used to work in the past, the NODEALLOCATIONPOLICY is set toMINRESOURCE, BACKFILL to BESTFIT.We have another couple of queue also linked in a similar manner and theyseem to be working fine but in this case it just donesn't work as Iexpect it too - obviously I have something wrong but any help would beappreciated.


Jon
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

[Mauiusers] trouble with classlist

Reply via email to