Hi Andre, We have preemption working at our site on that version of maui.
We have found that the settings below seem to be necessary for it to work at our site. I don't see a SYSCFG in your config, and I don't see a GROUPCFG for the admins group? I may be off base on these, since I know some bugs have been fixed since we got this working, but you may want to try setting those. On this line you set the "sys" QOS but I don't see it elsewhere... CLASSCFG[admins] MAXPROC=280 QDEF=sys PRIORITY=2001 I see this "admins" one... QOSCFG[admins] QFLAGS=PREEMPTOR PRIORITY=1000 Good luck, Tom ( this is a fragment of our maui config file ...) QOSWEIGHT 1 SYSCFG QLIST=bigmem,integration,interactive,debug,regress,contingent QOSCFG[bigmem] PRIORITY=1 QFLAGS=PREEMPTOR,RESTARTPREEMPT QOSCFG[integration] PRIORITY=1 QFLAGS=USERESERVED QOSCFG[interactive] PRIORITY=2 QFLAGS=PREEMPTOR,RESTARTPREEMPT QOSCFG[debug] PRIORITY=1 QOSCFG[regress] PRIORITY=-1 QOSCFG[contingent] PRIORITY=-2 QFLAGS=PREEMPTEE GROUPCFG[users] QDEF=DEFAULT QLIST=bigmem,integration,interactive,debug,regress,contingent CLASSCFG[regress] QDEF=contingent Andre Gauthier wrote: > HI, I'm trying to get preemption to work with Maui and Torque. I > have dozen queues, but one is define as a preemptee (general queue & > qos) and another as a preemptor (admins queue & qos). I submit a job > to the queue that is a premptee then a job to the preemptor. The > preemptor does not run. Maui version 3.2.6p21, Torque Version > 2.3.6-1. > > qstat: > > Job id Name User Time Use S Queue > ------------------------- ---------------- --------------- -------- - ----- > 459.hpc-test sleep.sh user2 00:00:00 R > general > 460.hpc-test sleep.sh user1 0 Q admins > > > checkjob 460: > > checking job 460 > > State: Idle EState: Deferred > Creds: user:user1 group:admins class:admins qos:admins > WallTime: 00:00:00 of 1:00:00 > SubmitTime: Tue Apr 20 11:41:28 > (Time Queued Total: 00:00:02 Eligible: 00:00:01) > > StartDate: 00:00:00 Tue Apr 20 11:41:30 > Total Tasks: 8 > > Req[0] TaskCount: 8 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > Dedicated Resources Per Task: PROCS: 1 MEM: 32M > > > IWD: [NONE] Executable: [NONE] > Bypass: 0 StartCount: 1 > PartitionMask: [ALL] > Flags: RESTARTABLE PREEMPTOR > > job is deferred. Reason: RMFailure (cannot start job - RM failure, > rc: 15044, msg: 'Resource temporarily unavailable MSG=job allocation > request exceeds currently available cluster nodes, 1 requested, 0 > available') > Holds: Defer (hold reason: RMFailure) > PE: 8.00 StartPriority: 3001 > cannot select job 460 for partition DEFAULT (job hold active) > > > checkjob 459: > > checking job 459 > > State: Running > Creds: user:user2 group:user2 class:general qos:general > WallTime: 00:03:05 of 1:00:00 > SubmitTime: Tue Apr 20 11:41:11 > (Time Queued Total: 00:00:19 Eligible: 00:00:01) > > StartTime: Tue Apr 20 11:41:30 > Total Tasks: 96 > > Req[0] TaskCount: 96 Partition: DEFAULT > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > Dedicated Resources Per Task: PROCS: 1 MEM: 2M > Allocated Nodes: > [compute-0-15:8][compute-0-13:8][compute-0-12:8][compute-0-11:8] > [compute-0-10:8][compute-0-9:8][compute-0-8:8][compute-0-7:8] > [compute-0-6:8][compute-0-5:8][compute-0-4:8][compute-0-3:8] > > > > IWD: [NONE] Executable: [NONE] > Bypass: 0 StartCount: 2 > PartitionMask: [ALL] > Flags: RESTARTABLE PREEMPTEE > Attr: PREEMPTEE > > Reservation '459' (-00:03:06 -> 00:56:54 Duration: 1:00:00) > PE: 96.00 StartPriority: 200 > > > > > > showconfig: > > > > IWD: [NONE] Executable: [NONE] > Bypass: 0 StartCount: 2 > PartitionMask: [ALL] > Flags: RESTARTABLE PREEMPTEE > Attr: PREEMPTEE > > Reservation '459' (-00:03:06 -> 00:56:54 Duration: 1:00:00) > PE: 96.00 StartPriority: 200 > > [r...@hpc-test maui]# showconfig > # Maui version 3.2.6p21 (PID: 16046) > # global policies > > REJECTNEGPRIOJOBS[0] FALSE > ENABLENEGJOBPRIORITY[0] FALSE > ENABLEMULTINODEJOBS[0] TRUE > ENABLEMULTIREQJOBS[0] FALSE > BFPRIORITYPOLICY[0] [NONE] > JOBPRIOACCRUALPOLICY QUEUEPOLICY > NODELOADPOLICY ADJUSTSTATE > USEMACHINESPEED FALSE > USESYSTEMQUEUETIME TRUE > USELOCALMACHINEPRIORITY FALSE > NODEUNTRACKEDLOADFACTOR 1.2 > JOBNODEMATCHPOLICY[0] > > JOBMAXSTARTTIME[0] INFINITY > > METAMAXTASKS[0] 0 > NODESETPOLICY[0] [NONE] > NODESETATTRIBUTE[0] [NONE] > NODESETLIST[0] > NODESETDELAY[0] 00:00:00 > NODESETPRIORITYTYPE[0] MINLOSS > NODESETTOLERANCE[0] 0.00 > > BACKFILLPOLICY[0] FIRSTFIT > BACKFILLDEPTH[0] 0 > BACKFILLPROCFACTOR[0] 0 > BACKFILLMAXSCHEDULES[0] 10000 > BACKFILLMETRIC[0] PROCS > > BFCHUNKDURATION[0] 00:00:00 > BFCHUNKSIZE[0] 0 > PREEMPTPOLICY[0] REQUEUE > MINADMINSTIME[0] 00:00:00 > RESOURCELIMITPOLICY[0] > NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT] > NODEALLOCATIONPOLICY[0] MINRESOURCE > TASKDISTRIBUTIONPOLICY[0] DEFAULT > RESERVATIONPOLICY[0] NEVER > RESERVATIONRETRYTIME[0] 00:00:00 > RESERVATIONTHRESHOLDTYPE[0] NONE > RESERVATIONTHRESHOLDVALUE[0] 0 > > FSPOLICY [NONE] > FSPOLICY [NONE] > FSINTERVAL 12:00:00 > FSDEPTH 8 > FSDECAY 1.00 > > > > # Priority Weights > > SERVICEWEIGHT[0] 1 > TARGETWEIGHT[0] 1 > CREDWEIGHT[0] 1 > ATTRWEIGHT[0] 1 > FSWEIGHT[0] 1 > RESWEIGHT[0] 1 > USAGEWEIGHT[0] 1 > QUEUETIMEWEIGHT[0] 1 > XFACTORWEIGHT[0] 0 > SPVIOLATIONWEIGHT[0] 0 > BYPASSWEIGHT[0] 0 > TARGETQUEUETIMEWEIGHT[0] 0 > TARGETXFACTORWEIGHT[0] 0 > USERWEIGHT[0] 1 > GROUPWEIGHT[0] 1 > ACCOUNTWEIGHT[0] 0 > QOSWEIGHT[0] 1 > CLASSWEIGHT[0] 1 > FSUSERWEIGHT[0] 0 > FSGROUPWEIGHT[0] 0 > FSACCOUNTWEIGHT[0] 0 > FSQOSWEIGHT[0] 0 > FSCLASSWEIGHT[0] 0 > ATTRATTRWEIGHT[0] 0 > ATTRSTATEWEIGHT[0] 0 > NODEWEIGHT[0] 0 > PROCWEIGHT[0] 0 > MEMWEIGHT[0] 0 > SWAPWEIGHT[0] 0 > DISKWEIGHT[0] 0 > PSWEIGHT[0] 0 > PEWEIGHT[0] 0 > WALLTIMEWEIGHT[0] 0 > UPROCWEIGHT[0] 0 > UJOBWEIGHT[0] 0 > CONSUMEDWEIGHT[0] 0 > USAGEEXECUTIONTIMEWEIGHT[0] 0 > REMAININGWEIGHT[0] 0 > PERCENTWEIGHT[0] 0 > XFMINWCLIMIT[0] 00:02:00 > > > # partition DEFAULT policies > > REJECTNEGPRIOJOBS[1] FALSE > ENABLENEGJOBPRIORITY[1] FALSE > ENABLEMULTINODEJOBS[1] TRUE > ENABLEMULTIREQJOBS[1] FALSE > BFPRIORITYPOLICY[1] [NONE] > JOBPRIOACCRUALPOLICY QUEUEPOLICY > NODELOADPOLICY ADJUSTSTATE > JOBNODEMATCHPOLICY[1] > > JOBMAXSTARTTIME[1] INFINITY > > METAMAXTASKS[1] 0 > NODESETPOLICY[1] [NONE] > NODESETATTRIBUTE[1] [NONE] > NODESETLIST[1] > NODESETDELAY[1] 00:00:00 > NODESETPRIORITYTYPE[1] MINLOSS > NODESETTOLERANCE[1] 0.00 > > # Priority Weights > > XFMINWCLIMIT[1] 00:00:00 > > RMAUTHTYPE[0] CHECKSUM > > CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE] > CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE] > CLASSCFG[DEFAULT] DEFAULT.FEATURES=[NONE] > CLASSCFG[batch] DEFAULT.FEATURES=[NONE] > CLASSCFG[interactive] DEFAULT.FEATURES=[NONE] > CLASSCFG[general] DEFAULT.FEATURES=[NONE] > CLASSCFG[priya] DEFAULT.FEATURES=[NONE] > CLASSCFG[admins] DEFAULT.FEATURES=[NONE] > CLASSCFG[sohrab] DEFAULT.FEATURES=[NONE] > CLASSCFG[micro] DEFAULT.FEATURES=[NONE] > CLASSCFG[altonji] DEFAULT.FEATURES=[NONE] > CLASSCFG[easther] DEFAULT.FEATURES=[NONE] > CLASSCFG[berry] DEFAULT.FEATURES=[NONE] > CLASSCFG[hpcprog] DEFAULT.FEATURES=[NONE] > CLASSCFG[macro] DEFAULT.FEATURES=[NONE] > QOSPRIORITY[0] 0 > QOSQTWEIGHT[0] 0 > QOSXFWEIGHT[0] 0 > QOSTARGETXF[0] 0.00 > QOSTARGETQT[0] 00:00:00 > QOSFLAGS[0] > QOSPRIORITY[1] 0 > QOSQTWEIGHT[1] 0 > QOSXFWEIGHT[1] 0 > QOSTARGETXF[1] 0.00 > QOSTARGETQT[1] 00:00:00 > QOSFLAGS[1] > QOSPRIORITY[2] 100 > QOSQTWEIGHT[2] 0 > QOSXFWEIGHT[2] 0 > QOSTARGETXF[2] 100.00 > QOSTARGETQT[2] 00:00:00 > QOSFLAGS[2] > QOSPRIORITY[3] -1000 > QOSQTWEIGHT[3] 0 > QOSXFWEIGHT[3] 0 > QOSTARGETXF[3] 0.00 > QOSTARGETQT[3] 00:00:00 > QOSFLAGS[3] > QOSPRIORITY[4] 1000 > QOSQTWEIGHT[4] 0 > QOSXFWEIGHT[4] 0 > QOSTARGETXF[4] 0.00 > QOSTARGETQT[4] 00:00:00 > QOSFLAGS[4] PREEMPTOR > QOSPRIORITY[5] 100 > QOSQTWEIGHT[5] 0 > QOSXFWEIGHT[5] 0 > QOSTARGETXF[5] 0.00 > QOSTARGETQT[5] 00:00:00 > QOSFLAGS[5] PREEMPTEE > # SERVER MODULES: MX > SERVERMODE NORMAL > SERVERNAME > SERVERHOST hpc-test.wss.yale.edu > SERVERPORT 42559 > LOGFILE maui.log > LOGFILEMAXSIZE 10000000 > LOGFILEROLLDEPTH 1 > LOGLEVEL 3 > LOGFACILITY fALL > SERVERHOMEDIR /opt/maui/ > TOOLSDIR /opt/maui/tools/ > LOGDIR /opt/maui/log/ > STATDIR /opt/maui/stats/ > LOCKFILE /opt/maui/maui.pid > SERVERCONFIGFILE /opt/maui/maui.cfg > CHECKPOINTFILE /opt/maui/maui.ck > CHECKPOINTINTERVAL 00:05:00 > CHECKPOINTEXPIRATIONTIME 3:11:20:00 > TRAPJOB > TRAPNODE > TRAPFUNCTION > RESDEPTH 24 > > RMPOLLINTERVAL 00:00:30 > NODEACCESSPOLICY SHARED > ALLOCLOCALITYPOLICY [NONE] > SIMTIMEPOLICY [NONE] > ADMIN1 maui root > ADMINHOSTS ALL > NODEPOLLFREQUENCY 0 > DISPLAYFLAGS > DEFAULTDOMAIN > DEFAULTCLASSLIST [DEFAULT:1] > FEATURENODETYPEHEADER > FEATUREPROCSPEEDHEADER > FEATUREPARTITIONHEADER > DEFERTIME 1:00:00 > DEFERCOUNT 24 > DEFERSTARTCOUNT 1 > JOBPURGETIME 0 > NODEPURGETIME 2140000000 > APIFAILURETHRESHHOLD 6 > NODESYNCTIME 600 > JOBSYNCTIME 600 > JOBMAXOVERRUN 00:10:00 > NODEMAXLOAD 0.0 > > PLOTMINTIME 120 > PLOTMAXTIME 245760 > PLOTTIMESCALE 11 > PLOTMINPROC 1 > PLOTMAXPROC 512 > PLOTPROCSCALE 9 > SCHEDCFG[] MODE=NORMAL > SERVER=hpc-test.wss.yale.edu:42559 > # RM MODULES: PBS SSS WIKI NATIVE > RMCFG[base] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:01:30 TYPE=PBS > SIMWORKLOADTRACEFILE workload > SIMRESOURCETRACEFILE resource > SIMAUTOSHUTDOWN OFF > SIMSTARTTIME 0 > SIMSCALEJOBRUNTIME FALSE > SIMFLAGS > SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH > SIMINITIALQUEUEDEPTH 16 > SIMWCACCURACY 0.00 > SIMWCACCURACYCHANGE 0.00 > SIMNODECOUNT 0 > SIMNODECONFIGURATION NORMAL > SIMWCSCALINGPERCENT 100 > SIMCOMRATE 0.10 > SIMCOMTYPE ROUNDROBIN > COMINTRAFRAMECOST 0.30 > COMINTERFRAMECOST 0.30 > SIMSTOPITERATION -1 > SIMEXITITERATION -1 > > > > cat maui.cfg: > > > # maui.cfg.tmpl for Maui v3.2.5 > > # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html > # use the 'schedctl -l' command to display current configuration > > RMPOLLINTERVAL 00:00:30 > > SERVERHOST hpc-test.wss.yale.edu > SERVERPORT 42559 > SERVERMODE NORMAL > > RMCFG[base] TYPE=PBS TIMEOUT=90 > > # Admin: http://supercluster.org/mauidocs/a.esecurity.html > # ADMIN1 users have full scheduler control > > ADMIN1 maui root > > LOGFILE maui.log > LOGFILEMAXSIZE 10000000 > LOGLEVEL 3 > > # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html > > QUEUETIMEWEIGHT 1 > > # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html > > #FSPOLICY PSDEDICATED > #FSDEPTH 7 > #FSINTERVAL 86400 > #FSDECAY 0.80 > > # Throttling Policies: > http://supercluster.org/mauidocs/6.2throttlingpolicies.html > > # NONE SPECIFIED > > # Backfill: http://supercluster.org/mauidocs/8.2backfill.html > > BACKFILLPOLICY FIRSTFIT > RESERVATIONPOLICY NEVER # set to never for premption. > > # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html > > NODEALLOCATIONPOLICY MINRESOURCE > > # QOS: http://supercluster.org/mauidocs/7.3qos.html > > QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB > QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE > > # Standing Reservations: > http://supercluster.org/mauidocs/7.1.3standingreservations.html > > # SRSTARTTIME[test] 8:00:00 > # SRENDTIME[test] 17:00:00 > # SRDAYS[test] MON TUE WED THU FRI > # SRTASKCOUNT[test] 20 > # SRMAXTIME[test] 0:30:00 > > #PREEMPTPOLICY set by AG > PREEMPTIONPOLICY REQUEUE > > # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html > > USERCFG[DEFAULT] FSTARGET=25.0 > USERCFG[john] PRIORITY=100 FSTARGET=10.0- > GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi > CLASSCFG[batch] FLAGS=PREEMPTEE > CLASSCFG[interactive] FLAGS=PREEMPTOR > > ###set QOS needed for premptions > QOSWEIGHT 1 > QOSCFG[admins] QFLAGS=PREEMPTOR PRIORITY=1000 > QOSCFG[general] QFLAGS=PREEMPTEE PRIORITY=100 > > GROUPWEIGHT 1 > CLASSWEIGHT 1 > CREDWEIGHT 1 > USERWEIGHT 1 > > > CLASSCFG[general] QDEF=general PRIORITY=100 > > GROUPWEIGHT 1 > CLASSCFG[DEFAULT] MAXPROC=280 QDEF=general PRIORITY=200 > CLASSCFG[admins] MAXPROC=280 QDEF=sys PRIORITY=2001 > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers > > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
