The following worked for me DEFERTIME 00:00:05
BUt I was told JOBAGGREGATIONTIME something higher than the default would fix it too? I haven't tried that yet. Right now its 00:00:00 On Tue, Apr 20, 2010 at 4:45 PM, Tom Rudwick <[email protected]> wrote: > Hi Andre, > > We have preemption working at our site on that version of maui. > > We have found that the settings below seem to be necessary for > it to work at our site. I don't see a SYSCFG in your config, > and I don't see a GROUPCFG for the admins group? I may be off > base on these, since I know some bugs have been fixed since we > got this working, but you may want to try setting those. > > On this line you set the "sys" QOS but I don't see it elsewhere... > > CLASSCFG[admins] MAXPROC=280 QDEF=sys PRIORITY=2001 > > I see this "admins" one... > > QOSCFG[admins] QFLAGS=PREEMPTOR PRIORITY=1000 > > Good luck, > > Tom > > ( this is a fragment of our maui config file ...) > > QOSWEIGHT 1 > SYSCFG QLIST=bigmem,integration,interactive,debug,regress,contingent > QOSCFG[bigmem] PRIORITY=1 QFLAGS=PREEMPTOR,RESTARTPREEMPT > QOSCFG[integration] PRIORITY=1 QFLAGS=USERESERVED > QOSCFG[interactive] PRIORITY=2 QFLAGS=PREEMPTOR,RESTARTPREEMPT > QOSCFG[debug] PRIORITY=1 > QOSCFG[regress] PRIORITY=-1 > QOSCFG[contingent] PRIORITY=-2 QFLAGS=PREEMPTEE > GROUPCFG[users] QDEF=DEFAULT > QLIST=bigmem,integration,interactive,debug,regress,contingent > CLASSCFG[regress] QDEF=contingent > > > > Andre Gauthier wrote: >> >> HI, I'm trying to get preemption to work with Maui and Torque. I >> have dozen queues, but one is define as a preemptee (general queue & >> qos) and another as a preemptor (admins queue & qos). I submit a job >> to the queue that is a premptee then a job to the preemptor. The >> preemptor does not run. Maui version 3.2.6p21, Torque Version >> 2.3.6-1. >> >> qstat: >> >> Job id Name User Time Use S >> Queue >> ------------------------- ---------------- --------------- -------- - >> ----- >> 459.hpc-test sleep.sh user2 00:00:00 R >> general >> 460.hpc-test sleep.sh user1 0 Q >> admins >> >> >> checkjob 460: >> >> checking job 460 >> >> State: Idle EState: Deferred >> Creds: user:user1 group:admins class:admins qos:admins >> WallTime: 00:00:00 of 1:00:00 >> SubmitTime: Tue Apr 20 11:41:28 >> (Time Queued Total: 00:00:02 Eligible: 00:00:01) >> >> StartDate: 00:00:00 Tue Apr 20 11:41:30 >> Total Tasks: 8 >> >> Req[0] TaskCount: 8 Partition: ALL >> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 >> Opsys: [NONE] Arch: [NONE] Features: [NONE] >> Dedicated Resources Per Task: PROCS: 1 MEM: 32M >> >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 1 >> PartitionMask: [ALL] >> Flags: RESTARTABLE PREEMPTOR >> >> job is deferred. Reason: RMFailure (cannot start job - RM failure, >> rc: 15044, msg: 'Resource temporarily unavailable MSG=job allocation >> request exceeds currently available cluster nodes, 1 requested, 0 >> available') >> Holds: Defer (hold reason: RMFailure) >> PE: 8.00 StartPriority: 3001 >> cannot select job 460 for partition DEFAULT (job hold active) >> >> >> checkjob 459: >> >> checking job 459 >> >> State: Running >> Creds: user:user2 group:user2 class:general qos:general >> WallTime: 00:03:05 of 1:00:00 >> SubmitTime: Tue Apr 20 11:41:11 >> (Time Queued Total: 00:00:19 Eligible: 00:00:01) >> >> StartTime: Tue Apr 20 11:41:30 >> Total Tasks: 96 >> >> Req[0] TaskCount: 96 Partition: DEFAULT >> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 >> Opsys: [NONE] Arch: [NONE] Features: [NONE] >> Dedicated Resources Per Task: PROCS: 1 MEM: 2M >> Allocated Nodes: >> [compute-0-15:8][compute-0-13:8][compute-0-12:8][compute-0-11:8] >> [compute-0-10:8][compute-0-9:8][compute-0-8:8][compute-0-7:8] >> [compute-0-6:8][compute-0-5:8][compute-0-4:8][compute-0-3:8] >> >> >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 2 >> PartitionMask: [ALL] >> Flags: RESTARTABLE PREEMPTEE >> Attr: PREEMPTEE >> >> Reservation '459' (-00:03:06 -> 00:56:54 Duration: 1:00:00) >> PE: 96.00 StartPriority: 200 >> >> >> >> >> >> showconfig: >> >> >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 2 >> PartitionMask: [ALL] >> Flags: RESTARTABLE PREEMPTEE >> Attr: PREEMPTEE >> >> Reservation '459' (-00:03:06 -> 00:56:54 Duration: 1:00:00) >> PE: 96.00 StartPriority: 200 >> >> [r...@hpc-test maui]# showconfig >> # Maui version 3.2.6p21 (PID: 16046) >> # global policies >> >> REJECTNEGPRIOJOBS[0] FALSE >> ENABLENEGJOBPRIORITY[0] FALSE >> ENABLEMULTINODEJOBS[0] TRUE >> ENABLEMULTIREQJOBS[0] FALSE >> BFPRIORITYPOLICY[0] [NONE] >> JOBPRIOACCRUALPOLICY QUEUEPOLICY >> NODELOADPOLICY ADJUSTSTATE >> USEMACHINESPEED FALSE >> USESYSTEMQUEUETIME TRUE >> USELOCALMACHINEPRIORITY FALSE >> NODEUNTRACKEDLOADFACTOR 1.2 >> JOBNODEMATCHPOLICY[0] >> >> JOBMAXSTARTTIME[0] INFINITY >> >> METAMAXTASKS[0] 0 >> NODESETPOLICY[0] [NONE] >> NODESETATTRIBUTE[0] [NONE] >> NODESETLIST[0] >> NODESETDELAY[0] 00:00:00 >> NODESETPRIORITYTYPE[0] MINLOSS >> NODESETTOLERANCE[0] 0.00 >> >> BACKFILLPOLICY[0] FIRSTFIT >> BACKFILLDEPTH[0] 0 >> BACKFILLPROCFACTOR[0] 0 >> BACKFILLMAXSCHEDULES[0] 10000 >> BACKFILLMETRIC[0] PROCS >> >> BFCHUNKDURATION[0] 00:00:00 >> BFCHUNKSIZE[0] 0 >> PREEMPTPOLICY[0] REQUEUE >> MINADMINSTIME[0] 00:00:00 >> RESOURCELIMITPOLICY[0] >> NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT] >> NODEALLOCATIONPOLICY[0] MINRESOURCE >> TASKDISTRIBUTIONPOLICY[0] DEFAULT >> RESERVATIONPOLICY[0] NEVER >> RESERVATIONRETRYTIME[0] 00:00:00 >> RESERVATIONTHRESHOLDTYPE[0] NONE >> RESERVATIONTHRESHOLDVALUE[0] 0 >> >> FSPOLICY [NONE] >> FSPOLICY [NONE] >> FSINTERVAL 12:00:00 >> FSDEPTH 8 >> FSDECAY 1.00 >> >> >> >> # Priority Weights >> >> SERVICEWEIGHT[0] 1 >> TARGETWEIGHT[0] 1 >> CREDWEIGHT[0] 1 >> ATTRWEIGHT[0] 1 >> FSWEIGHT[0] 1 >> RESWEIGHT[0] 1 >> USAGEWEIGHT[0] 1 >> QUEUETIMEWEIGHT[0] 1 >> XFACTORWEIGHT[0] 0 >> SPVIOLATIONWEIGHT[0] 0 >> BYPASSWEIGHT[0] 0 >> TARGETQUEUETIMEWEIGHT[0] 0 >> TARGETXFACTORWEIGHT[0] 0 >> USERWEIGHT[0] 1 >> GROUPWEIGHT[0] 1 >> ACCOUNTWEIGHT[0] 0 >> QOSWEIGHT[0] 1 >> CLASSWEIGHT[0] 1 >> FSUSERWEIGHT[0] 0 >> FSGROUPWEIGHT[0] 0 >> FSACCOUNTWEIGHT[0] 0 >> FSQOSWEIGHT[0] 0 >> FSCLASSWEIGHT[0] 0 >> ATTRATTRWEIGHT[0] 0 >> ATTRSTATEWEIGHT[0] 0 >> NODEWEIGHT[0] 0 >> PROCWEIGHT[0] 0 >> MEMWEIGHT[0] 0 >> SWAPWEIGHT[0] 0 >> DISKWEIGHT[0] 0 >> PSWEIGHT[0] 0 >> PEWEIGHT[0] 0 >> WALLTIMEWEIGHT[0] 0 >> UPROCWEIGHT[0] 0 >> UJOBWEIGHT[0] 0 >> CONSUMEDWEIGHT[0] 0 >> USAGEEXECUTIONTIMEWEIGHT[0] 0 >> REMAININGWEIGHT[0] 0 >> PERCENTWEIGHT[0] 0 >> XFMINWCLIMIT[0] 00:02:00 >> >> >> # partition DEFAULT policies >> >> REJECTNEGPRIOJOBS[1] FALSE >> ENABLENEGJOBPRIORITY[1] FALSE >> ENABLEMULTINODEJOBS[1] TRUE >> ENABLEMULTIREQJOBS[1] FALSE >> BFPRIORITYPOLICY[1] [NONE] >> JOBPRIOACCRUALPOLICY QUEUEPOLICY >> NODELOADPOLICY ADJUSTSTATE >> JOBNODEMATCHPOLICY[1] >> >> JOBMAXSTARTTIME[1] INFINITY >> >> METAMAXTASKS[1] 0 >> NODESETPOLICY[1] [NONE] >> NODESETATTRIBUTE[1] [NONE] >> NODESETLIST[1] >> NODESETDELAY[1] 00:00:00 >> NODESETPRIORITYTYPE[1] MINLOSS >> NODESETTOLERANCE[1] 0.00 >> >> # Priority Weights >> >> XFMINWCLIMIT[1] 00:00:00 >> >> RMAUTHTYPE[0] CHECKSUM >> >> CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE] >> CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE] >> CLASSCFG[DEFAULT] DEFAULT.FEATURES=[NONE] >> CLASSCFG[batch] DEFAULT.FEATURES=[NONE] >> CLASSCFG[interactive] DEFAULT.FEATURES=[NONE] >> CLASSCFG[general] DEFAULT.FEATURES=[NONE] >> CLASSCFG[priya] DEFAULT.FEATURES=[NONE] >> CLASSCFG[admins] DEFAULT.FEATURES=[NONE] >> CLASSCFG[sohrab] DEFAULT.FEATURES=[NONE] >> CLASSCFG[micro] DEFAULT.FEATURES=[NONE] >> CLASSCFG[altonji] DEFAULT.FEATURES=[NONE] >> CLASSCFG[easther] DEFAULT.FEATURES=[NONE] >> CLASSCFG[berry] DEFAULT.FEATURES=[NONE] >> CLASSCFG[hpcprog] DEFAULT.FEATURES=[NONE] >> CLASSCFG[macro] DEFAULT.FEATURES=[NONE] >> QOSPRIORITY[0] 0 >> QOSQTWEIGHT[0] 0 >> QOSXFWEIGHT[0] 0 >> QOSTARGETXF[0] 0.00 >> QOSTARGETQT[0] 00:00:00 >> QOSFLAGS[0] >> QOSPRIORITY[1] 0 >> QOSQTWEIGHT[1] 0 >> QOSXFWEIGHT[1] 0 >> QOSTARGETXF[1] 0.00 >> QOSTARGETQT[1] 00:00:00 >> QOSFLAGS[1] >> QOSPRIORITY[2] 100 >> QOSQTWEIGHT[2] 0 >> QOSXFWEIGHT[2] 0 >> QOSTARGETXF[2] 100.00 >> QOSTARGETQT[2] 00:00:00 >> QOSFLAGS[2] >> QOSPRIORITY[3] -1000 >> QOSQTWEIGHT[3] 0 >> QOSXFWEIGHT[3] 0 >> QOSTARGETXF[3] 0.00 >> QOSTARGETQT[3] 00:00:00 >> QOSFLAGS[3] >> QOSPRIORITY[4] 1000 >> QOSQTWEIGHT[4] 0 >> QOSXFWEIGHT[4] 0 >> QOSTARGETXF[4] 0.00 >> QOSTARGETQT[4] 00:00:00 >> QOSFLAGS[4] PREEMPTOR >> QOSPRIORITY[5] 100 >> QOSQTWEIGHT[5] 0 >> QOSXFWEIGHT[5] 0 >> QOSTARGETXF[5] 0.00 >> QOSTARGETQT[5] 00:00:00 >> QOSFLAGS[5] PREEMPTEE >> # SERVER MODULES: MX >> SERVERMODE NORMAL >> SERVERNAME >> SERVERHOST hpc-test.wss.yale.edu >> SERVERPORT 42559 >> LOGFILE maui.log >> LOGFILEMAXSIZE 10000000 >> LOGFILEROLLDEPTH 1 >> LOGLEVEL 3 >> LOGFACILITY fALL >> SERVERHOMEDIR /opt/maui/ >> TOOLSDIR /opt/maui/tools/ >> LOGDIR /opt/maui/log/ >> STATDIR /opt/maui/stats/ >> LOCKFILE /opt/maui/maui.pid >> SERVERCONFIGFILE /opt/maui/maui.cfg >> CHECKPOINTFILE /opt/maui/maui.ck >> CHECKPOINTINTERVAL 00:05:00 >> CHECKPOINTEXPIRATIONTIME 3:11:20:00 >> TRAPJOB >> TRAPNODE >> TRAPFUNCTION >> RESDEPTH 24 >> >> RMPOLLINTERVAL 00:00:30 >> NODEACCESSPOLICY SHARED >> ALLOCLOCALITYPOLICY [NONE] >> SIMTIMEPOLICY [NONE] >> ADMIN1 maui root >> ADMINHOSTS ALL >> NODEPOLLFREQUENCY 0 >> DISPLAYFLAGS >> DEFAULTDOMAIN >> DEFAULTCLASSLIST [DEFAULT:1] >> FEATURENODETYPEHEADER >> FEATUREPROCSPEEDHEADER >> FEATUREPARTITIONHEADER >> DEFERTIME 1:00:00 >> DEFERCOUNT 24 >> DEFERSTARTCOUNT 1 >> JOBPURGETIME 0 >> NODEPURGETIME 2140000000 >> APIFAILURETHRESHHOLD 6 >> NODESYNCTIME 600 >> JOBSYNCTIME 600 >> JOBMAXOVERRUN 00:10:00 >> NODEMAXLOAD 0.0 >> >> PLOTMINTIME 120 >> PLOTMAXTIME 245760 >> PLOTTIMESCALE 11 >> PLOTMINPROC 1 >> PLOTMAXPROC 512 >> PLOTPROCSCALE 9 >> SCHEDCFG[] MODE=NORMAL >> SERVER=hpc-test.wss.yale.edu:42559 >> # RM MODULES: PBS SSS WIKI NATIVE >> RMCFG[base] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:01:30 TYPE=PBS >> SIMWORKLOADTRACEFILE workload >> SIMRESOURCETRACEFILE resource >> SIMAUTOSHUTDOWN OFF >> SIMSTARTTIME 0 >> SIMSCALEJOBRUNTIME FALSE >> SIMFLAGS >> SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH >> SIMINITIALQUEUEDEPTH 16 >> SIMWCACCURACY 0.00 >> SIMWCACCURACYCHANGE 0.00 >> SIMNODECOUNT 0 >> SIMNODECONFIGURATION NORMAL >> SIMWCSCALINGPERCENT 100 >> SIMCOMRATE 0.10 >> SIMCOMTYPE ROUNDROBIN >> COMINTRAFRAMECOST 0.30 >> COMINTERFRAMECOST 0.30 >> SIMSTOPITERATION -1 >> SIMEXITITERATION -1 >> >> >> >> cat maui.cfg: >> >> >> # maui.cfg.tmpl for Maui v3.2.5 >> >> # full parameter docs at >> http://supercluster.org/mauidocs/a.fparameters.html >> # use the 'schedctl -l' command to display current configuration >> >> RMPOLLINTERVAL 00:00:30 >> >> SERVERHOST hpc-test.wss.yale.edu >> SERVERPORT 42559 >> SERVERMODE NORMAL >> >> RMCFG[base] TYPE=PBS TIMEOUT=90 >> >> # Admin: http://supercluster.org/mauidocs/a.esecurity.html >> # ADMIN1 users have full scheduler control >> >> ADMIN1 maui root >> >> LOGFILE maui.log >> LOGFILEMAXSIZE 10000000 >> LOGLEVEL 3 >> >> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html >> >> QUEUETIMEWEIGHT 1 >> >> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html >> >> #FSPOLICY PSDEDICATED >> #FSDEPTH 7 >> #FSINTERVAL 86400 >> #FSDECAY 0.80 >> >> # Throttling Policies: >> http://supercluster.org/mauidocs/6.2throttlingpolicies.html >> >> # NONE SPECIFIED >> >> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html >> >> BACKFILLPOLICY FIRSTFIT >> RESERVATIONPOLICY NEVER # set to never for premption. >> >> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html >> >> NODEALLOCATIONPOLICY MINRESOURCE >> >> # QOS: http://supercluster.org/mauidocs/7.3qos.html >> >> QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB >> QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE >> >> # Standing Reservations: >> http://supercluster.org/mauidocs/7.1.3standingreservations.html >> >> # SRSTARTTIME[test] 8:00:00 >> # SRENDTIME[test] 17:00:00 >> # SRDAYS[test] MON TUE WED THU FRI >> # SRTASKCOUNT[test] 20 >> # SRMAXTIME[test] 0:30:00 >> >> #PREEMPTPOLICY set by AG >> PREEMPTIONPOLICY REQUEUE >> >> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html >> >> USERCFG[DEFAULT] FSTARGET=25.0 >> USERCFG[john] PRIORITY=100 FSTARGET=10.0- >> GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi >> CLASSCFG[batch] FLAGS=PREEMPTEE >> CLASSCFG[interactive] FLAGS=PREEMPTOR >> >> ###set QOS needed for premptions >> QOSWEIGHT 1 >> QOSCFG[admins] QFLAGS=PREEMPTOR PRIORITY=1000 >> QOSCFG[general] QFLAGS=PREEMPTEE PRIORITY=100 >> >> GROUPWEIGHT 1 >> CLASSWEIGHT 1 >> CREDWEIGHT 1 >> USERWEIGHT 1 >> >> >> CLASSCFG[general] QDEF=general PRIORITY=100 >> >> GROUPWEIGHT 1 >> CLASSCFG[DEFAULT] MAXPROC=280 QDEF=general PRIORITY=200 >> CLASSCFG[admins] MAXPROC=280 QDEF=sys PRIORITY=2001 >> _______________________________________________ >> mauiusers mailing list >> [email protected] >> http://www.supercluster.org/mailman/listinfo/mauiusers >> >> > > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
