Robin,

Some of our deadlines have past and I was able to take time today to look at 
this suspension problem
in more detail. I have found that the solution is not just a simple fix in the 
code, but a
combination of settings and changes.

The first issue I investigated was why the suspended job's run-priority was not 
growing over time;
in other words, why the job was not "aging." In order to ensure the job's 
run-priority would grow,
even in a suspended state, I implemented a new job priority weight factor called
USAGEEXECUTIONTIMEWEIGHT. This, like other USAGE sub-component factors, is only 
applied to active
jobs and only works if the USAGEWEIGHT is set to something other than 0. A 
positive
EXECUTIONTIMEWEIGHT will cause jobs that have a start time (including suspended 
jobs, as they were
once started), to increase in run priority over time. With these settings the 
job should properly age.

In my testing, I also found that an internal Maui attribute named the "suspension min time" could sometimes get in the way of resuming the suspended job. This attribute's purpose is to prevent Maui from suspending and resuming and then suspending the same job within the same iteration. (It prevents rapid "flipping" of jobs.) A job will not resume after being suspended until after this min time has passed. Maui starts counting immediately after the job is suspended/resumed. This attribute was set to 60 seconds and if the PREEMPTOR job finished before this time, then the suspended job would not resume because the min time had not yet been satisfied. Even with a growing priority this min time could prevent jobs from being resumed. In order to help alleviate the chances of this happening often, I decreased the "suspension min time" to 10 seconds.

The last way that this issue can exhibit itself is when an advanced reservation is blocking the suspended job's ability to resume. This happens only if the PREEMPTOR job's wallclock limit is less-than or equal-to the suspended job's wallclock limit.

For example, if we have two jobs in the queue with the same priority, A_low and B_low, and B_low was submitted second, then let's say A_low starts and takes up the nodes needed by B_low. So B_low is now in the Idle queue, but creates a reservation in the future so it can guarantee to run after A_low is complete. Next a PREEMPTOR job, C_high, comes in with a higher priority and suspends A_low so that C_high can run. The advanced reservation that B_low has will now be adjusted to fit "around" the new wallclock limit of C_high. If C_high runs shorter than A_low does, then B_low's advanced reservation will move backward in time. If C_high ends, and A_low tries to resume it won't be able too, because B_low's advanced reservation will be overlapping A_low's run-length. If, however, C_high was longer than A_low's wallclock, then A_low can still squeeze in before B_low's reservation begins.

Perhaps the example was a little much, but I hope you get the idea. In Maui there is currently only one way to get around this: controlling the creation of advanced reservations. Depending on the needs of your cluster, you can disable advanced reservations altogether by using:

RESERVATIONPOLICY NEVER

in your maui.cfg. If this suspension problem really hurts the utilization of your cluster, than this solution may work best for your site. Otherwise, it may be a little overkill.

In Moab Workload Manager you can enable lower priority reservations to be "preempted" as well, allowing for A_low to run no matter where B_low's reservation begins. Adding this feature to Maui would, unfortunately, be quite the extensive effort and I don't foresee us being able to implement it anytime soon.

All of the above changes have been included in the most recent development snapshot available at http://www.clusterresources.com/downloads/maui/.

Let me know if you experience any problems or have any more questions. We appreciate the continuing support from the Maui community and their active participation in resolving bugs and creating enhancements.

--
Joshua Butikofer
Cluster Resources, Inc.

[EMAIL PROTECTED]
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Robin Humble wrote:
Hi,

On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote:
We've confirmed that this behavior is happening in Maui. Moab Workload Manager currently has the desired behavior with suspended jobs accruing priority (and also correctly handles different classes involved). We hope that over the next few weeks we will be able to make these improvements in Maui as well. We will keep the list posted on our progress.

any updates?

in case you were looking for a simpler test case, the below 2 queue
system seems to have the same behaviour as the previous bug report -
ie. the suspended PREEMPTEE job has a hard time resuming.

in other words after a PREEMPTOR job steams through (correctly) we end
up with a previously queued PREEMPTEE job then being chosen to run over
the top of the suspended PREEMPTEE job.

I don't think this is correct behaviour as only PREEMPTOR jobs should
be able to run over the top of PREEMPTEE jobs.

versions are:
torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16

relevant part of maui.cfg:

PREEMPTPOLICY SUSPEND
CLASSCFG[debug]      QDEF=high
CLASSCFG[workq]      QDEF=low
QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR
QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE
QOSWEIGHT       1

cheers,
robin

--
Joshua Butikofer
Cluster Resources, Inc.

[EMAIL PROTECTED]
(801) 798-7488
--------------------------


David Corredor wrote:
The problem is not just that the suspended job gets once again preempted
by a job of its same class from the IDLE queue, this happens regardless
of the class of the new job.

 Ex.  3 queues (1 verylong, 1 long, 1 fast.  Fast preempts long and
verylong, and long preempts verylong, verylong should not preempt).
   - Submit 1 long job so that it takes all resources in cluster.
   - Submit a verylong job so that it waits in the IDLE queue.
   - Submit a fast job.

 The fast job preempts the long one, and once it finishes, instead of the
long one to resume execution, the verylong kicks in and preempts it once
again (and it shouldn't).





<quote who="Ronny T. Lampert">

.....
However I experience the very same problem as you do (I need the
QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and instead
a
NEW job from the batch queue is started :-(

I think this is a bug: suspended jobs *should age*, too.
Or automatically get a slightly higher priority than the highest in the
same
class to prevent it from staying suspended and interrupted by jobs from
the
same class.

Could some developer shortly comment on that issue?

Thanks!
Ronny




_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to