Dear list,I am trying to setup a very basic Torque+Maui system. I am running a Torque cluster for a year now, and wanted to improve the scheduling with Maui. To this end, I installed a fresh test-system, with server and node on a single computer.
Torque version: 2.4.16 Maui version: 3.3.1uname: Linux testing 3.2.0-20-generic #33-Ubuntu SMP Tue Mar 27 16:42:26 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
I was able to run (simple) jobs with the Torque scheduler. When I replaced the scheduler with Maui, jobs stay queued. Jobs are submitted by:
$ qsub -q batch test-script.shwhere test-script.sh is nothing more than a 'sleep 1m' script. Checking the job:
# checkjob -v 55 checking job 55 (RM job '55.testing.azr.nl') State: Idle EState: Deferred Creds: user:sebastiaan group:sebastiaan class:batch qos:DEFAULT WallTime: 00:00:00 of 6:00:00 SubmitTime: Thu Apr 5 13:21:33 (Time Queued Total: 00:00:32 Eligible: 00:00:01) Total Tasks: 1 Req[0] TaskCount: 1 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 15G Opsys: [NONE] Arch: [NONE] Features: [1][ppn=1] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 MEM: 2000M SWAP: 15G NodeAccess: SHARED NodeCount: 1 IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLEjob is deferred. Reason: NoResources (cannot create reservation for job '55' (intital reservation attempt)
) Holds: Defer (hold reason: NoResources) PE: 16.03 StartPriority: 1 cannot select job 55 for partition DEFAULT (job hold active) show that there are no resources available. The node is free, and unloaded: # checknode testing checking node testing.azr.nl State: Idle (in current state for 2:23:54) Configured Resources: PROCS: 2 MEM: 984M SWAP: 1996M DISK: 1M Utilized Resources: SWAP: 149M Dedicated Resources: [NONE] Opsys: linux Arch: [NONE] Speed: 1.00 Load: 0.050 Network: [DEFAULT] Features: [NONE] Attributes: [Batch] Classes: [batch 2:2] Total Time: 16:11:49 Up: 16:11:49 (100.00%) Active: 00:01:00 (0.10%) Reservations: NOTE: no reservations on node When the job is added, maui.log shows this: 04/05 13:21:34 MPBSJobLoad(55,55.testing.azr.nl,J,TaskList,0) 04/05 13:21:34 MReqCreate(55,SrcRQ,DstRQ,DoCreate) 04/05 13:21:34 INFO: processing node request line '1' 04/05 13:21:34 MJobSetCreds(55,sebastiaan,sebastiaan,)04/05 13:21:34 INFO: default QOS for job 55 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE]) 04/05 13:21:34 INFO: default QOS for job 55 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE]) 04/05 13:21:34 INFO: default QOS for job 55 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE]) 04/05 13:21:34 INFO: job '55' loaded: 1 sebastiaan sebastiaan 21600 Idle 0 1333624893 [NONE] [NONE] [NONE] >= 0 >= 0 [1][ppn=1] 1333624894
04/05 13:21:34 INFO: 12 PBS jobs detected on RM TESTING 04/05 13:21:34 INFO: jobs detected: 12 04/05 13:21:34 MStatClearUsage(node,Active) 04/05 13:21:34 MClusterUpdateNodeState() 04/05 13:21:34 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg) 04/05 13:21:34 INFO: job '40' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '41' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '42' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '44' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '45' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '47' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '48' Priority: 1604/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 16(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '49' Priority: 1204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 12(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '52' Priority: 804/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 8(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '53' Priority: 104/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '54' Priority: 6004/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 60(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '55' Priority: 104/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 MStatClearUsage([NONE],Active) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 INFO: total jobs selected (ALL): 1/12 [EState: 11] 04/05 13:21:34 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg) 04/05 13:21:34 INFO: job '40' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '41' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '42' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '44' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '45' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '47' Priority: 2204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 22(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '48' Priority: 1604/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 16(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '49' Priority: 1204/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 12(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '52' Priority: 804/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 8(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '53' Priority: 104/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '54' Priority: 6004/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 60(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 INFO: job '55' Priority: 104/05 13:21:34 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
04/05 13:21:34 MStatClearUsage([NONE],Idle) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 MResDestroy(NULL) 04/05 13:21:34 INFO: total jobs selected (ALL): 1/12 [EState: 11]04/05 13:21:34 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
04/05 13:21:34 INFO: total jobs selected in partition ALL: 1/1 04/05 13:21:34 MQueueScheduleRJobs(Q)04/05 13:21:34 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
04/05 13:21:34 INFO: total jobs selected in partition ALL: 1/104/05 13:21:34 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
04/05 13:21:34 INFO: total jobs selected in partition DEFAULT: 1/1 04/05 13:21:34 MQueueScheduleIJobs(Q,DEFAULT)04/05 13:21:34 INFO: 0 feasible tasks found for job 55:0 in partition DEFAULT (1 Needed)
04/05 13:21:34 MJobPReserve(55,DEFAULT,ResCount,ResCountRej) 04/05 13:21:34 MJobReserve(55,Priority) 04/05 13:21:34 ALERT: job 55 cannot run in any partition04/05 13:21:34 ALERT: cannot create new reservation for job 55 (shape[1] 1)
04/05 13:21:34 ALERT: cannot create new reservation for job 5504/05 13:21:34 MJobSetHold(55,16,1:00:00,NoResources,cannot create reservation for job '55' (intital reservation attempt)
)04/05 13:21:34 ALERT: job '55' cannot run (deferring job for 3600 seconds)
04/05 13:21:34 WARNING: cannot reserve priority job '55' Active Jobs------ ------------------ 04/05 13:21:34 INFO: resources available after scheduling: N: 1 P: 204/05 13:21:34 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE) 04/05 13:21:34 INFO: total jobs selected in partition DEFAULT: 0/1 [EState: 1] 04/05 13:21:34 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE) 04/05 13:21:34 INFO: total jobs selected in partition ALL: 0/1 [EState: 1] 04/05 13:21:34 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE) 04/05 13:21:34 INFO: total jobs selected in partition ALL: 0/1 [EState: 1]
04/05 13:21:34 MSchedUpdateStats() 04/05 13:21:34 INFO: iteration: 288 scheduling time: 0.002 seconds 04/05 13:21:34 MResUpdateStats()04/05 13:21:34 INFO: current util[288]: 0/1 (0.00%) PH: 0.00% active jobs: 0 of 2 (completed: 1)
04/05 13:21:34 MQueueCheckStatus() 04/05 13:21:34 MNodeCheckStatus() 04/05 13:21:34 MUClearChild(PID) 04/05 13:21:34 INFO: scheduling complete. sleeping 30 seconds I think the relevant line is:04/05 13:21:34 INFO: 0 feasible tasks found for job 55:0 in partition DEFAULT (1 Needed)
but I have no idea how to make a feasible task for the job. I have tried queueing with -l nodes=1:ppn=1 -l walltime=2:00:00, etc. but none seem to have had effect.
Torque config. I have tried setting different attributes to the queue properties, hoping that it would have some effect:
# qmgr -c "p s" # # Create queues and set their attributes. # # # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch Priority = 20 set queue batch max_running = 8 set queue batch resources_max.ncpus = 8 set queue batch resources_max.nodect = 10 set queue batch resources_max.nodes = 2 set queue batch resources_min.ncpus = 0 set queue batch resources_default.mem = 2000mb set queue batch resources_default.ncpus = 1 set queue batch resources_default.neednodes = 1:ppn=1 set queue batch resources_default.nodect = 1 set queue batch resources_default.nodes = 1 set queue batch resources_default.pvmem = 16000mb set queue batch resources_default.walltime = 06:00:00 set queue batch enabled = True set queue batch started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = testing.azr.nl set server log_events = 511 set server mail_from = adm set server resources_available.nodect = 10 set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server next_job_number = 56 Maui configuration, untouched: # maui.cfg 3.3.1 SERVERHOST testing # primary admin must be first in list ADMIN1 root # Resource Manager Definition RMCFG[TESTING] TYPE=PBS # Allocation Manager Definition AMCFG[bank] TYPE=NONE # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration RMPOLLINTERVAL 00:00:30 SERVERPORT 42559 SERVERMODE NORMAL # Admin: http://supercluster.org/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3 # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html QUEUETIMEWEIGHT 1 # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html #FSPOLICY PSDEDICATED #FSDEPTH 7 #FSINTERVAL 86400 #FSDECAY 0.80# Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED # Backfill: http://supercluster.org/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html NODEALLOCATIONPOLICY MINRESOURCE # QOS: http://supercluster.org/mauidocs/7.3qos.html # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE# Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00 # SRENDTIME[test] 17:00:00 # SRDAYS[test] MON TUE WED THU FRI # SRTASKCOUNT[test] 20 # SRMAXTIME[test] 0:30:00 # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html # USERCFG[DEFAULT] FSTARGET=25.0 # USERCFG[john] PRIORITY=100 FSTARGET=10.0- # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi # CLASSCFG[batch] FLAGS=PREEMPTEE # CLASSCFG[interactive] FLAGS=PREEMPTOR Any ideas? Thanks in advance, Sebastiaan -- Sebastiaan Breedveld, MSc. Ph.D. student Erasmus MC - Daniel den Hoed Cancer Center Department of Radiation Oncology Groene Hilledijk 301 3075 EA Rotterdam The Netherlands Phone: +31 10 7042693 Room: Gs-20
<<attachment: s_breedveld.vcf>>
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
