We have the same problem and we never resolved it within Maui. We do it with a somewhat ugly combination of a prologue script and a cron script on the scheduler node that between them, offline nodes when suspending jobs start, add system reservations for the correct number of slots and online the node again, and clean up afterwards. I don't imagine there would be any problem with me sharing the code with you, if you're interested. In our case there is only one job class/queue that causes suspension so the logic of when to offline nodes is simple.
On 17 January 2013 16:13, Robert Jacobi <[email protected]> wrote: > Hello All, > > Since I hadn't gotten any response yet, I just wanted to reiterate my > request for help if anyone has an idea. > > Thanks, > Robert > > On 01/05/2013 06:08 PM, Robert Jacobi wrote: > > Hello, > > > > We have an issue on our cluster with the implementation of Preemption > > using Maui. I have configured two queues and corresponding QOS levels > > in maui.cfg, one preemptor, one premptee (we are running Maui version > > 3.3): > > > > PREEMPTPOLICY SUSPEND > > QOSWEIGHT 10 > > QOSCFG[hi] PRIORITY=10000 QFLAGS=PREEMPTOR > > QOSCFG[lo] PRIORITY=10 QFLAGS=PREEMPTEE > > CLASSWEIGHT 10 > > CLASSCFG[debug] QDEF=hi QLIST=debug > > CLASSCFG[low-priority] QDEF=lo > > > > When we have a premptee job runnning and submit a preemptor job, the > > preemptee job gets suspended just as it should, while the preemptor > > job runs. If there is no other job in the queue, the suspended job > > resumes execution afterwards and finishes properly. > > The problem occurs if the preemptor needs less processors than the > > preemptee, and there is another job in the queue that fits in the > > resulting gap. For example, I submit a preemptee job on the two test > > nodes we use. Then I submit a preemptor job that only needs one node. > > Now, while the preemptor is running, one node is idle. If at this > > point there are (other preemptee) jobs in the queue or submitted that > > only need one node, than all these jobs will be executed before the > > suspended job resumes, despite the suspended job waiting. This happens > > even though they have lower priority (checked through diagnose -p) > > tghan the suspended job and don't even fit into the backfill window > > created by the preemptor job (i.e. have a greater wall time requirement). > > > > I have searched the archives of this list and read any related thread > > I could find, and consequently tried ALL the following settings in > > various combinatons (and restarted Maui after each change): > > BACKFILLPOLICY: NONE, FIRSTFIT, BESTFIT > > RESERVATIONPOLICY: CURRENTHIGHEST, HIGHEST, NEVER > > RESERVATIONDEPTH: 0, 1, 3 > > > > I do absolutely not understand how Maui can overlook the suspended job > > and execute a lower priority job in the same QOS from the queue, even > > if backfilling and reservations are disabled. > > > > I'd appreciate any suggestions why this happens and how to mend it. > > Robert > > > > -- > Robert Jacobi > Research Assistant > University of Arizona > Department of Aerospace& Mechanical Engineering > 1130 N. Mountain Ave. > Tucson, AZ, 85721-0119 > > tel: +1 (520) 621 4369 > mail: [email protected] > > > The less time you spent on algebra in life, the more time you have to be a > happy person. (Kerschen) > > Doubt is not a pleasant condition, but certainty is absurd. (Voltaire) > > All great truths begin as blasphemies. (Shaw) > > Denken ist etwas, das auf Schwierigkeiten folgt und dem das Handeln > vorausgeht.(Brecht) > > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers > -- Fraser McCrossan SHARCNET Systems Administrator University of Western Ontario [email protected] http://www.sharcnet.ca (519) 661-2111 x80360
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
