Hi All, I'm trying to implement a single queue in torque with multiple QoS levels in maui so that I can manage priorities and such in one place, Maui. The idea is that jobs submitted specify the -l qos=<level> for short,long,high,low,normal jobs. Below is my maui.cfg file and details for a job submitted with qsub -I -l qos=long. Of particular note is the regardless of the -l qos it sets the QOS to normal. Please help I seem to be missing something rather obvious. Please note that I did try with QDEF=normal QLIST=normal,low,high,long,debug for both USERCFG[DEFAULT] and CLASSCFG[DEFAULT] which, if I understand correctly, should have provided my users with the option to specify these QoS levels via #PBS -l qos=<level>
######## MAUI.CFG ############# SERVERHOST queen # primary admin must be first in list ADMIN1 root ADMIN3 ALL # Resource Manager Definition RMCFG[BASE] TYPE=PBS AMCFG[bank] TYPE=NONE # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration JOBAGGREGATIONTIME 00:00:10 RMPOLLINTERVAL 00:00:30 SERVERPORT 42559 SERVERMODE NORMAL # Admin: http://supercluster.org/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3 # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html SERVWEIGHT 1 QUEUETIMEWEIGHT 10 # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html FSPOLICY DEDICATEDPES FSDEPTH 7 FSINTERVAL 24:00:00 FSDECAY 0.80 FSWEIGHT 5 FSUSERWEIGHT 10 FSGROUPWEIGHT 1 FSCLASSWEIGHT 1 FSACCOUNTWEIGHT 1 USERWEIGHT 10 GROUPWEIGHT 5 QOSWEIGHT 1 #These lines provide a way to avoid "job starvation": when a job is queued, it grows its priority. XFACTORWEIGHT 3 XFWEIGHT 7 XFCAP 1000000 # Purge job information. Keep for 28 days JOBPURGETIME 28:00:00:00 # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html #jobs exceeding limits don't increase their priority MAXJOBQUEUEDPERUSERCOUNT 30 # Backfill: http://supercluster.org/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html NODEALLOCATIONPOLICY MINRESOURCE #Increase priority for queued jobs JOBPRIOACCRUALPOLICY MAXJOBQUEUEDPERUSERPOLICY ON # Allow users to specify multiple requirements for jobs # resource specifications such as '-l nodes=3:fast+1:io' ENABLEMULTIREQJOBS TRUE # Job Preepmtion # specifies how preemptible jobs will be preempted # available options are REQUEUE, SUSPEND, CHECKPOINT PREEMPTPOLICY SUSPEND # How should maui handle jobs that utilize more resoureces # than they requested. RESOURCELIMITPOLICY MEM:EXTENDEDVIOLATION:CANCEL # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html USERCFG[DEFAULT] FSTARGET=5 GROUPCFG[cs_visitors] MAXJOB=2 #GROUPCFG[ensc_ugrad] MAXJOB=8,16 # SRCFG[administrative] PERIOD=INFINITY # SRCFG[administrative] STARTTIME=0:00:00 ENDTIME=24:00:00 # SRCFG[administrative] USERLIST=jpeltier # HOSTLIST=a02-nll,a03-nll,a04-nll,a05-nll,a06-nll,a07-nll,a08-nll QOSCFG[high] PRIORITY=5000 QOSCFG[normal] PRIORITY=0 QOSCFG[low] PRIORITY=-5000 QOSCFG[long] MAXJOB=4 QOSCFG[debug] WALLTIME=01:00:00 # ensure that some nodes are still able to run # within a 24 hour period SHORTPOOLPOLICY ON SHORTPOOLMAXTIME 86400 SHORTPOOLMINSIZE 128 #### JOB SUBMISSION #### qsub -l qos=long -I qsub: waiting for job 75564.queen to start qsub: job 75564.queen ready #### CHECKJOB 75564 #### checking job 75564 State: Running Creds: user:jpeltier group:staff class:batch qos:normal WallTime: 00:00:00 of 1:00:00 SubmitTime: Tue Apr 14 11:05:32 (Time Queued Total: 00:00:01 Eligible: 00:00:01) StartTime: Tue Apr 14 11:05:33 Total Tasks: 1 Req[0] TaskCount: 1 Partition: DEFAULT Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Dedicated Resources Per Task: PROCS: 1 MEM: 1024M NodeCount: 1 Allocated Nodes: [sdats1:1] IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 1 PartitionMask: [ALL] Reservation '75564' (00:00:00 -> 1:00:00 Duration: 1:00:00) PE: 1.00 StartPriority: 251 #### qstat -f 75564 #### Job Id: 75564.queen Job_Name = STDIN Job_Owner = jpelt...@queen resources_used.cput = 00:00:00 resources_used.mem = 9532kb resources_used.vmem = 114208kb resources_used.walltime = 00:04:23 job_state = R queue = batch server = queen Checkpoint = u ctime = Tue Apr 14 11:05:32 2009 Error_Path = /dev/pts/1 exec_host = sdats1/0 Hold_Types = n interactive = True Join_Path = n Keep_Files = n Mail_Points = a mtime = Tue Apr 14 11:05:33 2009 Output_Path = /dev/pts/1 Priority = 0 qtime = Tue Apr 14 11:05:32 2009 Rerunable = False Resource_List.mem = 1gb Resource_List.ncpus = 1 Resource_List.neednodes = 1 Resource_List.nodect = 1 Resource_List.nodes = 1 Resource_List.qos = long Resource_List.walltime = 01:00:00 session_id = 30924 substate = 42 Variable_List = PBS_O_HOME=/home/fas3/jpeltier,PBS_O_LANG=en_US.UTF-8, PBS_O_LOGNAME=jpeltier, PBS_O_PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin: /usr/bin:/usr/X11R6/bin,PBS_O_MAIL=/var/spool/mail/jpeltier, PBS_O_SHELL=/bin/tcsh,PBS_SERVER=queen,PBS_O_HOST=queen, PBS_O_WORKDIR=/home/fas3/jpeltier/testing,PBS_O_QUEUE=batch euser = jpeltier egroup = staff hashname = 75564.queen queue_rank = 81809 queue_type = E etime = Tue Apr 14 11:05:32 2009 submit_args = -l qos=long -I start_time = Tue Apr 14 11:05:33 2009 start_count = 1 -- James A. Peltier Systems Analyst (FASNet), VIVARIUM Technical Director Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : [email protected] Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca http://blogs.sfu.ca/people/jpeltier MSN : [email protected] The point of the HPC scheduler is to keep everyone equally unhappy. _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
