2012/10/17 Ian Miller <[email protected]>: > Thx > That was the fix. > > Ian Miller > Research Computing Administrator > [email protected] > (312) 402-6170 > You're very welcome. D. > > > > > > On 10/17/12 1:26 PM, "Denis" <[email protected]> wrote: > >>2012/10/17 Ian Miller <[email protected]>: >>> Hi >>> I have maui verison 3.3.1 and touque version 2.5.7 >>> and I seem to have a few nodes sitting idle that should be running jobs. >>> They have been able to run jobs in the past but the cluster has never >>>run at >>> 80-90% >>> The output of showq is as follows (I omitted the jobs lists) >>> >>> 119 Active Jobs 130 of 344 Processors Active (37.79%) >>> >>> 15 of 35 Nodes Active (42.86%) >>> >>> Total Jobs: 467 Active Jobs: 119 Idle Jobs: 0 Blocked Jobs: 348 >>> >>> When I try to force run a job.. I get Š. >>> >>> root@beast$ qrun 209054 >>> >>> qrun: Execution server rejected request MSG=cannot send job to mom, >>> state=PRERUN 209054.beast-net >>> >>> 30 out of the 34 worker nodes at in one queue (batch) with 2 out of the >>>30 >>> shared between another queue. Currently 33 of the total jobs (467) are >>>in >>> a different queue (short) and are running fine, the reset are in the >>> default(batch). My question is how can I get the idle nodes to run this >>> jobs? >>> >>> What might be the problem? >>> >>Try restarting the mom services at the empty nodes. >>> >>> >>> Qmgr: print queue batch >>> >>> # Create queues and set their attributes. >>> >>> # >>> >>> # >>> >>> # Create and define queue batch >>> >>> # >>> >>> create queue batch >>> >>> set queue batch queue_type = Execution >>> >>> set queue batch max_running = 200 >>> >>> set queue batch resources_default.neednodes = batch >>> >>> set queue batch resources_default.nodes = 1 >>> >>> set queue batch max_user_run = 150 >>> >>> set queue batch keep_completed = 300 >>> >>> set queue batch enabled = True >>> >>> set queue batch started = True >>> >>> >>> # maui.cfg 3.3.1 >>> >>> SERVERHOST beast >>> >>> # primary admin must be first in list >>> >>> ADMIN1 root >>> >>> # Resource Manager Definition >>> >>> RMCFG[BEAST] TYPE=PBS >>> >>> # Allocation Manager Definition >>> >>> AMCFG[bank] TYPE=NONE >>> >>> # full parameter docs at >>>http://supercluster.org/mauidocs/a.fparameters.html >>> >>> # use the 'schedctl -l' command to display current configuration >>> >>> RMPOLLINTERVAL 00:00:30 >>> >>> SERVERPORT 42559 >>> >>> SERVERMODE NORMAL >>> >>> # Admin: http://supercluster.org/mauidocs/a.esecurity.html >>> >>> LOGFILE maui.log >>> >>> LOGFILEMAXSIZE 10000000 >>> >>> LOGLEVEL 3 >>> >>> # Job Priority: >>>http://supercluster.org/mauidocs/5.1jobprioritization.html >>> >>> QUEUETIMEWEIGHT 1 >>> >>> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html >>> >>> #FSPOLICY PSDEDICATED >>> >>> #FSDEPTH 7 >>> >>> #FSINTERVAL 86400 >>> >>> #FSDECAY 0.80 >>> >>> # Throttling Policies: >>> http://supercluster.org/mauidocs/6.2throttlingpolicies.html >>> >>> # NONE SPECIFIED >>> >>> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html >>> >>> BACKFILLPOLICY FIRSTFIT >>> RESERVATIONPOLICY CURRENTHIGHEST >>> >>> # Node Allocation: >>>http://supercluster.org/mauidocs/5.2nodeallocation.html >>> >>> NODEALLOCATIONPOLICY PRIORITY >>> NODECFG[DEFAULT] PRIORITYF='0.01*AMEM - 2*LOAD' >>> NODEAVAILABILITYPOLICY COMBINED:MEM >>> >>> SRCFG[Reinitz] HOSTLIST=minion1[2-9] >>> SRCFG[Reinitz] GROUPLIST=Reinitz >>> >>> # QOS: http://supercluster.org/mauidocs/7.3qos.html >>> >>> # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB >>> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE >>> >>> # Standing Reservations: >>> http://supercluster.org/mauidocs/7.1.3standingreservations.html >>> >>> # SRSTARTTIME[test] 8:00:00 >>> # SRENDTIME[test] 17:00:00 >>> # SRDAYS[test] MON TUE WED THU FRI >>> # SRTASKCOUNT[test] 20 >>> # SRMAXTIME[test] 0:30:00 >>> >>> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html >>> >>> USERCFG[DEFAULT] MAXIJOB=2000 >>> # USERCFG[DEFAULT] FSTARGET=25.0 >>> # USERCFG[john] PRIORITY=100 FSTARGET=10.0- >>> # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi >>> # CLASSCFG[batch] FLAGS=PREEMPTEE >>> # CLASSCFG[interactive] FLAGS=PREEMPTOR >>> >>> >>> >>> >>> >>> >>> >>> >>> Ian Miller >>> Research Computing Administrator >>> [email protected] >>> (312) 402-6170 >>> >>> >>> _______________________________________________ >>> mauiusers mailing list >>> [email protected] >>> http://www.supercluster.org/mailman/listinfo/mauiusers >>> >> >> >> >>-- >>Denis Anjos, >>www.versatushpc.com.br >
-- Denis Anjos, www.versatushpc.com.br _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
