Hello
I am having a problem getting the torque/maui system to apply a policy in
the way jobs are allocated to nodes in our cluster. I have browsed the
mailing list archive but have found no answers. I am hoping for some
suggestions.
Firstly the cluster consists of three node types.
type1: has fast IO system, 2 cpus
type2: 2 cpus
type3: 4 cpus
I have a "serial-io" queue that limits jobs to type1 nodes. I then have a
default queue for all other jobs.
Getting the "serial-io" queue to work was OK. The problem arises with the
default queue and allocating jobs. The allocation policy I would like is
as follows.
(A) If a job requests more than 2 cpus use type3 nodes [This always works]
(B) If a job requests only one node (max of 2 cpus) then it can be
allocated to, in order of preference type2, type3 and type1.
(C) If a job requests multiple nodes (max of 2 cpus) then it can be
allocated to, in order of preference type2 then type3.
I think I can assume that policy A will always work or maui is seriously
broken.
However policy B and C are very difficult to get working.
What I have so far:
The torque default queue splits jobs into execution queues "single" for
single node jobs (policy B) and "normal" for multi-node jobs (policy C).
This works reliably.
The problem is that maui tends to allocate jobs to the type3 nodes in
preference to the type2 nodes regardless of what I do. We only have two
type3 nodes so I want 2cpu/node jobs to use these only as a last resort.
But we have many more 2cpu/node jobs than 4cpu/node jobs so I don't want
to exclude using type3 nodes altogether.
Current Maui cfg. (Summary)
Partitions:
type1 nodes are in partition "serial"
type2/type3 nodes are in partition "normal"
Standing Reservations:
SRCFG[type1] HOSTLIST=n0[1-5] CLASSLIST=serial-io,single-
NODEFEATURES=type1 PERIOD=INFINITY
SRCFG[type3] HOSTLIST=n33,n34 CLASSLIST=normal-,single-
NODEFEATURES=type3 PERIOD=INFINITY
Class config:
CLASSCFG[serial-io] PDEF=serial DEFAULT.FEATURES=type1 PLIST=serial
CLASSCFG[single] PDEF=normal PLIST=normal,serial
CLASSCFG[normal] PDEF=normal PLIST=normal
I have confirmed that the configuration above is applied properly. We
also have fair sharing enabled and that appears to work OK too.
Node allocation policy (currently):
NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT] PARTITION=normal PRIORITY=1000 PRIORITYF='PRIORITY + PREF
+ RESAFFINITY'
# For each type 1 node
NODECFG[XX] PARTITION=serial PRIORITY=10
# For each type 2 node
NODECFG[XX] PARTITION=normal PRIORITY=1000
# For each type 3 node
NODECFG[XX] PARTITION=normal PRIORITY=100
Now I have played around with the different NODEALLOCATIONPOLICY settings,
especially PRIORITY and PRIORITYF, but nothing seems to change the
preference for type3 node allocation over type2 (or type1).
Additionally I am not entirely sure I need the partitions; I would like
the configuration to be as simple as possible.
Any comments or suggestions would be appreciated.
Cheers
Justin
--
Dr Justin Finnerty
Rm W3-1-218 Ph 49 (441) 798 3726
Carl von Ossietzky Universität Oldenburg
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers