Just to comment on the issue of priorities with suspended jobs. I have noticed that my problem getting suspended jobs to resume boils down to the following:
- Jobs only continue to increase their priority when they are in the IDLE queue, but those suspended jobs are still in the RUN queue, and their priority stays fixed to the same value over time. - I have seen this very clearly in the logs. - So, what happens is that the jobs in the IDLE queue eventually get a higher priority of the job that is suspended. The suspended job should ideally restart after the preemptor job finishes, but since the other job in the IDLE queue already has a higher priority, that other job gets an an automatic reservation for the nodes once they are free and they "preempt" the suspended job once again. And this happens regardless of whether this new job has the preemptor tag or not. I've changed many settings and I think I have one working (posted at the end). I had been testing this configuration for several days, and I noticed that my second problem was also the test jobs I was using to troubleshoot this issue: - I found out that if the preemptor job runs and finishes in less than 30 seconds, the suspended job cannot resume because of an invalid start time (it's start time is set to the future) and it gets jumped. - If the preemptor job runs for over 30 seconds, then it's all good. Except that short (<30 sec) jobs are not uncommon (like users testing out new binaries that may crash right away). And so, if that happens the user who submitted the long job and got preempted, is now out of luck. Here is my configuration. I believe it is working (limited testing so far), the only problem is the 30 second short preemptor jobs. -------------------- from maui.cfg -------------------- RMPOLLINTERVAL 00:00:30 SERVERPORT 42559 SERVERMODE NORMAL BACKFILLPOLICY BESTFIT PREEMPTPOLICY SUSPEND WCVIOLATIONACTION PREEMPT RESERVATIONPOLICY NEVER CREDWEIGHT 1 USERWEIGHT 0 GROUPWEIGHT 0 XFACTORWEIGHT 0 QOSWEIGHT 1 CLASSWEIGHT 1 RESWEIGHT 1 QUEUETIMEWEIGHT 0 JOBPRIOACCRUALPOLICY FULLPOLICY JOBNODEMATCHPOLICY EXACTPROC NODEALLOCATIONPOLICY PRIORITY NODEAVAILABILITYPOLICY UTILIZED NODEACCESSPOLICY SHARED NODECFG[default] PRIORITYF='APROCS - LOAD + 0.01 * AMEM + 0.1 * ASWAP' CLASSCFG[verylong] QDEF=verylong CLASSCFG[long] QDEF=long CLASSCFG[fast] QDEF=fast QOSCFG[verylong] QFLAGS=PREEMPTEE PRIORITY=5 QOSCFG[long] QFLAGS=PREEMPTEE:PREEMPTOR PRIORITY=10 QOSCFG[fast] QFLAGS=PREEMPTOR PRIORITY=1000 --------------------------------------------------------------- David On Thursday 20 April 2006 01:00 pm, [EMAIL PROTECTED] wrote: > Message: 1 > Date: Wed, 19 Apr 2006 23:02:34 -0700 > From: James Wigdahl <[EMAIL PROTECTED]> > Subject: Re: [Mauiusers] Suspended jobs resume execution > To: "Ronny T. Lampert" <[EMAIL PROTECTED]> > Cc: [email protected] > Message-ID: <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > > On Apr 19, 2006, at 6:31 AM, Ronny T. Lampert wrote: > > > QOSCFG[short] PRIORITY=100 QFLAGS=PREEMPTOR > > QOSCFG[default] PRIORITY=500 QFLAGS=PREEMPTEE > > Learn and use Maui's "diagnose -p". Your 'default' jobs always have > higher priority than your 'short' jobs and will therefore always be > favored. Flip the priorities here and you should see things start to > work as you'd like. > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
