We have used a reservation depth of 3 for years without any noticable
problems. Could this be related to a problem with preemption? We do not use
preemption in our setup. As a test it could be worth it to try to turn it off
and see if the reservations start working.
r.
On Friday 20. April 2012 14.27.57 Naveed Near-Ansari wrote:
> I know this isn't technically torque, but i haven't seen any activity on
> the maui list and I though there might be some overlap in users here.
>
> I am having an issue with a priority job not getting a reservation. When
> I set reservation depth to 2, the second priority job does get a
> reservation though.
>
> The cluster has 3552 core available for the queue it is submitted to, at
> the moment they are all in use. Since the jobs has the highest
> priority, it should start reserving nodes (and it does try.) When i
> change the RESERVATIONDEPTH to 2, the second highest priority job does
> get a reservation, though this is a much smaller job. Perhaps I am
> misunderstanding how these reservation work. If there a timefram in
> which it has to reserve nodes?
>
> We don't have a size limit on jobs and the cluster does have the
> resources for this job.
>
> Does anyone know what may be going on here? We have this type of
> workflow where some people send it very large jobs, and some small so I
> would like to figure out what is happening. Do you have any good
> strategies to deal with the type of workflow?
>
> Here is the checkjob output and as you can see, it isn't requesting any
> resources other than cores. I have no idea where it is getting the
> idle procs from since none are actually idle. perhaps it has do do with
> reservable nodes? The idle procs tends to fluctuate over time.
>
> checking job 213152
>
> State: Idle
> Creds: user:user group:group class:default qos:dedicated
> WallTime: 00:00:00 of 1:12:00:00
> SubmitTime: Fri Apr 6 03:35:23
> (Time Queued Total: 7:45:59 Eligible: 1:30:06)
>
> Total Tasks: 1501
>
> Req[0] TaskCount: 1501 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [default]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Flags: RESTARTABLE PREEMPTEE DEDICATEDNODE
> Attr: PREEMPTEE
>
> PE: 1501.00 StartPriority: 144235
> job cannot run in partition DEFAULT (insufficient idle procs available:
> 1056 < 1501)
>
>
> Here are the relevant log entries:
>
> 04/06 03:35:24 MJobPReserve(213152,DEFAULT,ResCount,ResCountRej)
> 04/06 03:35:24 INFO: 3552 feasible tasks found for job 213152:0 in
> partition DEFAULT (1501 Needed)
> 04/06 03:35:24 ALERT: job 213152 cannot run in any partition
> 04/06 03:35:24 ALERT: cannot create new reservation for job 213152
> (shape[1] 1501)
> 04/06 03:35:24 ALERT: cannot create new reservation for job 213152
> 04/06 03:35:24 ALERT: job '213152' cannot run (deferring job for 3600
> seconds)
> 04/06 03:35:24 WARNING: cannot reserve priority job '213152'
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: [email protected]
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers