2012/10/17 Ian Miller <[email protected]>: > Hi > I have maui verison 3.3.1 and touque version 2.5.7 > and I seem to have a few nodes sitting idle that should be running jobs. > They have been able to run jobs in the past but the cluster has never run at > 80-90% > The output of showq is as follows (I omitted the jobs lists) > > 119 Active Jobs 130 of 344 Processors Active (37.79%) > > 15 of 35 Nodes Active (42.86%) > > Total Jobs: 467 Active Jobs: 119 Idle Jobs: 0 Blocked Jobs: 348 > > When I try to force run a job.. I get …. > > root@beast$ qrun 209054 > > qrun: Execution server rejected request MSG=cannot send job to mom, > state=PRERUN 209054.beast-net > > 30 out of the 34 worker nodes at in one queue (batch) with 2 out of the 30 > shared between another queue. Currently 33 of the total jobs (467) are in > a different queue (short) and are running fine, the reset are in the > default(batch). My question is how can I get the idle nodes to run this > jobs? > > What might be the problem? > Try restarting the mom services at the empty nodes. > > > Qmgr: print queue batch > > # Create queues and set their attributes. > > # > > # > > # Create and define queue batch > > # > > create queue batch > > set queue batch queue_type = Execution > > set queue batch max_running = 200 > > set queue batch resources_default.neednodes = batch > > set queue batch resources_default.nodes = 1 > > set queue batch max_user_run = 150 > > set queue batch keep_completed = 300 > > set queue batch enabled = True > > set queue batch started = True > > > # maui.cfg 3.3.1 > > SERVERHOST beast > > # primary admin must be first in list > > ADMIN1 root > > # Resource Manager Definition > > RMCFG[BEAST] TYPE=PBS > > # Allocation Manager Definition > > AMCFG[bank] TYPE=NONE > > # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html > > # use the 'schedctl -l' command to display current configuration > > RMPOLLINTERVAL 00:00:30 > > SERVERPORT 42559 > > SERVERMODE NORMAL > > # Admin: http://supercluster.org/mauidocs/a.esecurity.html > > LOGFILE maui.log > > LOGFILEMAXSIZE 10000000 > > LOGLEVEL 3 > > # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html > > QUEUETIMEWEIGHT 1 > > # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html > > #FSPOLICY PSDEDICATED > > #FSDEPTH 7 > > #FSINTERVAL 86400 > > #FSDECAY 0.80 > > # Throttling Policies: > http://supercluster.org/mauidocs/6.2throttlingpolicies.html > > # NONE SPECIFIED > > # Backfill: http://supercluster.org/mauidocs/8.2backfill.html > > BACKFILLPOLICY FIRSTFIT > RESERVATIONPOLICY CURRENTHIGHEST > > # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html > > NODEALLOCATIONPOLICY PRIORITY > NODECFG[DEFAULT] PRIORITYF='0.01*AMEM - 2*LOAD' > NODEAVAILABILITYPOLICY COMBINED:MEM > > SRCFG[Reinitz] HOSTLIST=minion1[2-9] > SRCFG[Reinitz] GROUPLIST=Reinitz > > # QOS: http://supercluster.org/mauidocs/7.3qos.html > > # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB > # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE > > # Standing Reservations: > http://supercluster.org/mauidocs/7.1.3standingreservations.html > > # SRSTARTTIME[test] 8:00:00 > # SRENDTIME[test] 17:00:00 > # SRDAYS[test] MON TUE WED THU FRI > # SRTASKCOUNT[test] 20 > # SRMAXTIME[test] 0:30:00 > > # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html > > USERCFG[DEFAULT] MAXIJOB=2000 > # USERCFG[DEFAULT] FSTARGET=25.0 > # USERCFG[john] PRIORITY=100 FSTARGET=10.0- > # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi > # CLASSCFG[batch] FLAGS=PREEMPTEE > # CLASSCFG[interactive] FLAGS=PREEMPTOR > > > > > > > > > Ian Miller > Research Computing Administrator > [email protected] > (312) 402-6170 > > > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers >
-- Denis Anjos, www.versatushpc.com.br _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
