Hi,
My problem is that less than half the capacity of our nodes in being used.
We have two 64-core nodes. At any one time <~50 jobs are scheduled.
Looking at a job that's queued I get:
$ checkjob 22360
checking job 22360
State: Idle
Creds: user:bckhouse group:minos class:minos qos:DEFAULT
WallTime: 00:00:00 of INFINITY
SubmitTime: Tue May 14 06:11:10
(Time Queued Total: 5:24:43 Eligible: 5:24:43)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Dedicated Resources Per Task: PROCS: 1 MEM: 2000M
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
Reservation '22360' (00:00:00 -> INFINITY Duration: INFINITY)
PE: 1.00 StartPriority: 324
job cannot run in partition DEFAULT (insufficient idle procs available:
0 < 1)
So, the problem is there are no idle procs.
Checking one of the nodes:
$ checknode node078
checking node node078
State: Busy (in current state for 00:32:08)
Expected State: Running SyncDeadline: Tue May 14 11:37:10
Configured Resources: PROCS: 64 MEM: 252G SWAP: 252G DISK: 558G
Utilized Resources: PROCS: 64 SWAP: 17G DISK: 32M
Dedicated Resources: PROCS: 24 MEM: 46G
Opsys: linux Arch: [NONE]
Speed: 1.00 Load: 23.180
Network: [DEFAULT]
Features: [amd][MEM256G]
Attributes: [Batch]
Classes: [batch 64:64][minos 40:64]
Total Time: INFINITY Up: INFINITY (99.66%) Active: 27:15:36:09
(20.67%)
<snip a bunch of reservations corresponding to the running jobs>
So it knows it's only scheduled ("dedicated") 24 jobs, but it thinks the
utilization is the full 64.
This obviously isn't true.
node078:~$ uptime
11:38:13 up 16:45, 2 users, load average: 22.97, 23.24, 23.97
So the load is only the 24 jobs that are running, not 64. Likewise, the
node looks half-loaded in ganglia.
So, how is the "utilized" number determined? The docs make it sound like
it's just the load average, but that doesn't seem to be the case.
I set "NODEAVAILABILITYPOLICY DEDICATED:PROCS" in maui.cfg on the head
node, in an attempt to get maui to ignore the "utilized" number, but it
doesn't seem to have had any effect.
Additional information: I can get the nodes full if I "surprise" them by
submitting 128 jobs simultaneously. Those jobs run happily. That's the
state I want to be able to obtain in the general case.
Watching the load on ganglia, it looks like the jobs drain out of the
system as they complete, and then every 30 minutes they're topped back
up to the ~half utilization level. My guess is driven by the "defer"
mechanism.
Sorry if this problem has been answered before. I found several
instances of what sound like the same thing in the mail archives, but no
solutions :(
Thanks - Chris
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers