I've got a total of 7 dual-CPU machines running Maui/Torque jobs 24/7,
and after implementing my first set of processor limits, things are
not working as I had planned. I've got USERCFG[DEFAULT] MAXPROC=4,14
set in maui.cfg, and yet despite there being no jobs in the queue
other than those owned by user0000001 and user000002, and having 3
CPUs still available out of the 14, user0000001's other jobs get
blocked. So user0000001 has 8 of the 14 processors, and user000002 has
3. Why can't user0000001 take over the remaining 3 if nothing is
waiting? Did I just misread something?
Current showq output:
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
626 user0000001 Running 1 10:40:59 Wed Mar 29 16:12:46
644 user0000001 Running 1 16:10:33 Wed Mar 29 21:42:20
645 user0000001 Running 1 16:14:18 Wed Mar 29 21:46:05
646 user0000001 Running 1 16:19:33 Wed Mar 29 21:51:20
642 user0000001 Running 1 20:06:24 Wed Mar 29 20:38:11
635 user000002 Running 1 21:01:56 Thu Mar 30 13:33:43
627 user0000001 Running 1 21:33:27 Thu Mar 30 03:05:14
628 user0000001 Running 1 23:51:51 Thu Mar 30 05:23:38
629 user0000001 Running 1 23:58:37 Thu Mar 30 05:30:24
648 user000002 Running 1 1:09:04:11 Thu Mar 30 13:35:58
649 user000002 Running 1 1:09:04:26 Thu Mar 30 13:36:13
11 Active Jobs 11 of 14 Processors Active (78.57%)
6 of 7 Nodes Active (85.71%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
630 user0000001 Idle 1 1:11:00:00 Wed Mar 29 03:00:48
631 user0000001 Idle 1 1:11:00:00 Wed Mar 29 03:02:00
632 user0000001 Idle 1 1:11:00:00 Wed Mar 29 03:03:18
633 user0000001 Idle 1 1:11:00:00 Wed Mar 29 03:04:53
Total Jobs: 15 Active Jobs: 11 Idle Jobs: 0 Blocked Jobs: 4
Current output from checkjob 630:
checking job 630
State: Idle
Creds: user:user0000001 group:users class:ch226 qos:DEFAULT
WallTime: 00:00:00 of 1:11:00:00
SubmitTime: Wed Mar 29 03:00:48
(Time Queued Total: 1:13:33:28 Eligible: 18:58:55)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 1.00 StartPriority: 1138
cannot select job 630 for partition DEFAULT (job 630 violates active HARD
MAXPROC limit of 14 for user user0000001 (R: 1, U: 16)
)
--
Mike Renfro / R&D Engineer, Center for Manufacturing Research,
931 372-3601 / Tennessee Technological University -- [EMAIL PROTECTED]
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers