Just to comment on the issue of priorities with suspended jobs. I have noticed 
that 
my problem getting suspended jobs to resume boils down to the following:

   - Jobs only continue to increase their priority when they are in the IDLE 
queue, but
 those suspended jobs are still in the RUN queue, and their priority stays 
fixed to the
 same value over time.
   - I have seen this very clearly in the logs. 
   - So, what happens is that the jobs in the IDLE queue eventually get a higher
 priority of the job that is suspended. The suspended job should ideally restart
 after the preemptor job finishes, but since the other job in the IDLE queue 
already 
 has a higher priority, that other job gets an an automatic reservation for the 
nodes 
 once they are free and they "preempt" the suspended job once again. And this 
happens
 regardless of whether this new job has the preemptor tag or not. 


 I've changed many settings and I think I have one working (posted at the end). 
I had been
testing this configuration for several days, and I noticed that my second 
problem was
also the test jobs I was using to troubleshoot this issue:

   - I found out that if the preemptor job runs and finishes in less than 30 
seconds, the 
 suspended job cannot resume because of an invalid start time (it's start time 
is set to the
 future) and it gets jumped.
   - If the preemptor job runs for over 30 seconds, then it's all good. Except 
that short (<30 sec)
 jobs are not uncommon (like users testing out new binaries that may crash 
right away). And
 so, if that happens the user who submitted the long job and got preempted, is 
now out of 
 luck.

  Here is my configuration. I believe it is working (limited testing so far), 
the only problem 
is the 30 second short preemptor jobs.

--------------------
  from  maui.cfg
--------------------
RMPOLLINTERVAL        00:00:30
SERVERPORT            42559
SERVERMODE            NORMAL
BACKFILLPOLICY        BESTFIT
PREEMPTPOLICY     SUSPEND
WCVIOLATIONACTION PREEMPT
RESERVATIONPOLICY  NEVER
CREDWEIGHT            1
USERWEIGHT            0
GROUPWEIGHT           0
XFACTORWEIGHT         0
QOSWEIGHT             1
CLASSWEIGHT           1
RESWEIGHT             1
QUEUETIMEWEIGHT       0
JOBPRIOACCRUALPOLICY  FULLPOLICY
JOBNODEMATCHPOLICY EXACTPROC
NODEALLOCATIONPOLICY      PRIORITY
NODEAVAILABILITYPOLICY    UTILIZED
NODEACCESSPOLICY          SHARED
NODECFG[default] PRIORITYF='APROCS - LOAD + 0.01 * AMEM + 0.1 * ASWAP'
CLASSCFG[verylong]  QDEF=verylong
CLASSCFG[long]      QDEF=long
CLASSCFG[fast]      QDEF=fast
QOSCFG[verylong]  QFLAGS=PREEMPTEE            PRIORITY=5
QOSCFG[long]      QFLAGS=PREEMPTEE:PREEMPTOR  PRIORITY=10
QOSCFG[fast]      QFLAGS=PREEMPTOR            PRIORITY=1000

---------------------------------------------------------------

David




On Thursday 20 April 2006 01:00 pm, [EMAIL PROTECTED] wrote:
> Message: 1
> Date: Wed, 19 Apr 2006 23:02:34 -0700
> From: James Wigdahl <[EMAIL PROTECTED]>
> Subject: Re: [Mauiusers] Suspended jobs resume execution
> To: "Ronny T. Lampert" <[EMAIL PROTECTED]>
> Cc: [email protected]
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> 
> 
> On Apr 19, 2006, at 6:31 AM, Ronny T. Lampert wrote:
> 
> > QOSCFG[short]           PRIORITY=100 QFLAGS=PREEMPTOR
> > QOSCFG[default]         PRIORITY=500 QFLAGS=PREEMPTEE
> 
> Learn and use Maui's "diagnose -p". Your 'default' jobs always have  
> higher priority than your 'short' jobs and will therefore always be  
> favored. Flip the priorities here and you should see things start to  
> work as you'd like.
> 
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to