I have a 127 node cluster running Rocks 5.0 with Torque 2.3.0 and
Maui 3.2.6 (yes, I realize this is pretty old). Each node has dual Quad
Core Xeon cpus and 32GB of RAM.
I have 16 nodes that have a NVIDIA Tesla GPU installed in them. I want
only jobs that need the GPU to run on those nodes, and only one job at a time
since the GPU can only be owned by one process at a time.
So I defined a property 'GPU' on all the nodes with GPUs and a property
'nonGPU' on all nodes without. I created a queue called 'GPU' defined with
resources_default.neednodes = GPU. And on all our other queues I defined
resources_default.neednodes = nonGPU
I have told all our users running GPU jobs to submit with
-q GPU -l nodes=1:ppn=8:GPU
The problem I am having is that when
1) all the GPU nodes are busy
2) there are more GPU jobs queued
3) all non-GPU nodes have at least one jobs running on them
but still have free processor slots (not job-exclusive)
4) there are nonGPU jobs to run at lower priority than the GPU jobs
waiting to run
the nonGPU jobs waiting do not get run even though there are plenty
of free processr slots still on the nonGPU nodes.
For example here is a section from running showq that shows this. The
'mreuter' jobs are nonGPU ones. The 'rge21' jobs are GPU ones. You see the
'mreuter' jobs 1154576 and up are not being run even though there are plenty
of free nonGPU CPUs.
# showq
.....
1154571 mreuter Running 1 3:23:33:50 Thu Mar 10 17:31:06
1154572 mreuter Running 1 3:23:33:50 Thu Mar 10 17:31:06
1154573 mreuter Running 1 3:23:33:50 Thu Mar 10 17:31:06
1154575 mreuter Running 1 3:23:33:50 Thu Mar 10 17:31:06
493 Active Jobs 602 of 984 Processors Active (61.18%)
123 of 123 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1153960 rge21 Idle 8 2:00:00 Thu Mar 10 12:12:46
1153961 rge21 Idle 8 2:00:00 Thu Mar 10 12:12:47
1153962 rge21 Idle 8 2:00:00 Thu Mar 10 12:12:49
1153963 rge21 Idle 8 2:00:00 Thu Mar 10 12:12:53
1154576 mreuter Idle 1 4:00:00:00 Thu Mar 10 16:24:36
1154577 mreuter Idle 1 4:00:00:00 Thu Mar 10 16:24:50
1154578 mreuter Idle 1 4:00:00:00 Thu Mar 10 16:25:04
1154579 mreuter Idle 1 4:00:00:00 Thu Mar 10 16:25:18
....
Now as temp fix for this, I run these two commands:
showq | grep rge21 | grep Idle | cut -d' ' -f1 | xargs qhold
wait a minute and then run
showq | grep rge21 | grep Hold | cut -d' ' -f1 | xargs qrls
Then all the nonGPU jobs held up get run.
As I mentioned this only seems to happen when all nonGPU nodes have at least
one processor assigned to a job. I understand why that would hold up jobs if
those rge21 jobs where nonGPU jobs. So it seems maui is not properly taking
into account the properties of nodes along with the needednodes resource of
jobs in scheduling.
Is maui simply incapable of dealing with this situation or is there
some config I can use to handle it?
A few lines from maui.cfg that might be relavent.
QUEUETIMEWEIGHT 1
CLASSWEIGHT 10
USERCFG[DEFAULT] MAXIPROC=32
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT] PRIORITY=1000 PRIORITYF='PRIORITY + .01 * AMEM - 10 * JOBCOUNT
- 2 * USAGE'
--
---------------------------------------------------------------
Paul Raines email: raines at nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129 USA
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers