Hi all, I am having an issue with a priority job not getting a reservation. When I set resevation depth to 2, the second priority job does get a reservation though.
The cluster has 3552 core available for the queue it is submitted to, at the moment they are all in use. Since the jobs has the highest priority, it should start reserving nodes (and it does try.) WHen i change the RESERVATIONDEPTH to 2, the second highest priority job does get a reservation, though this is a much smaller job. We don't have a size limit on jobs and the cluster does have the resources for this job. Does anyone know what may be going on here? We have this type of workflow where some people send it very large jobs, and some small so I would like to figure out what is happy. Here is the checkjob output and as you can see, it isn't requesting any resources other than cores. I have no idead where it is getting the idle procs from since none are actually idle: checking job 213152 State: Idle Creds: user:user group:group class:default qos:dedicated WallTime: 00:00:00 of 1:12:00:00 SubmitTime: Fri Apr 6 03:35:23 (Time Queued Total: 7:45:59 Eligible: 1:30:06) Total Tasks: 1501 Req[0] TaskCount: 1501 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [default] IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE PREEMPTEE DEDICATEDNODE Attr: PREEMPTEE PE: 1501.00 StartPriority: 144235 job cannot run in partition DEFAULT (insufficient idle procs available: 1056 < 1501) Here are the relevant log entries: 04/06 03:35:24 MJobPReserve(213152,DEFAULT,ResCount,ResCountRej) 04/06 03:35:24 INFO: 3552 feasible tasks found for job 213152:0 in partition DEFAULT (1501 Needed) 04/06 03:35:24 ALERT: job 213152 cannot run in any partition 04/06 03:35:24 ALERT: cannot create new reservation for job 213152 (shape[1] 1501) 04/06 03:35:24 ALERT: cannot create new reservation for job 213152 04/06 03:35:24 ALERT: job '213152' cannot run (deferring job for 3600 seconds) 04/06 03:35:24 WARNING: cannot reserve priority job '213152' -- Naveed Near-Ansari E: [email protected] O: 626-395-2212 M: 626-394-3845
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
