Hi
I have maui verison 3.3.1 and touque version 2.5.7
and I seem to have a few nodes sitting idle that should be running jobs. They
have been able to run jobs in the past but the cluster has never run at 80-90%
The output of showq is as follows (I omitted the jobs lists)
119 Active Jobs 130 of 344 Processors Active (37.79%)
15 of 35 Nodes Active (42.86%)
Total Jobs: 467 Active Jobs: 119 Idle Jobs: 0 Blocked Jobs: 348
When I try to force run a job.. I get ….
root@beast$ qrun 209054
qrun: Execution server rejected request MSG=cannot send job to mom,
state=PRERUN 209054.beast-net
30 out of the 34 worker nodes at in one queue (batch) with 2 out of the 30
shared between another queue. Currently 33 of the total jobs (467) are in a
different queue (short) and are running fine, the reset are in the
default(batch). My question is how can I get the idle nodes to run this jobs?
What might be the problem?
Qmgr: print queue batch
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch max_running = 200
set queue batch resources_default.neednodes = batch
set queue batch resources_default.nodes = 1
set queue batch max_user_run = 150
set queue batch keep_completed = 300
set queue batch enabled = True
set queue batch started = True
# maui.cfg 3.3.1
SERVERHOST beast
# primary admin must be first in list
ADMIN1 root
# Resource Manager Definition
RMCFG[BEAST] TYPE=PBS
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT] PRIORITYF='0.01*AMEM - 2*LOAD'
NODEAVAILABILITYPOLICY COMBINED:MEM
SRCFG[Reinitz] HOSTLIST=minion1[2-9]
SRCFG[Reinitz] GROUPLIST=Reinitz
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations:
http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
USERCFG[DEFAULT] MAXIJOB=2000
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
Ian Miller
Research Computing Administrator
[email protected]
(312) 402-6170
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers