On Mar 15, 2006, at 8:55 AM, David Corredor wrote:
I'm trying to setup some basic preemtption with a "suspend"
policy whithin
Maui. The preemption part is working, except that the job that gets
preempted (suspended) doesn't restart execution until after all
other jobs
in the Idle queue are finished executing, even if those jobs
don't have the
preemtor flag set, and as far as I can tell, those jobs don't
have a higher
priority nor xfactor than the suspended job either.
By looking at the logs, it seems to me that while the first job was
suspended, and the preemptor was running, the next idle job in the
queue
(with same prioriy as the suspended one), was reserved the node next
somehow, and so when the suspended job is supposed to restart, it
doesn't
find an available node.
I would appreciate any hints in this regard.
I've been suffering with the same issue and was led to believe that
adding the following to my config would fix things:
FSPOLICY UTILIZEDPS
CONSUMEDWEIGHT 3
However, have not found this to resolve anything. Here is some live
output from 'diagnose -p' which I've edited only showing suspended
jobs:
# ./diagnose -p
diagnosing job priority information (partition: ALL)
Job PRIORITY* Cred( QOS:Class) Serv(QTime)
Targ(QTime) Res(Cons )
Weights -------- 5( 2: 8) 1
( 1) 1( 1) 1( 3)
8469 3547 19.7( 10.0: 15.0) 80.3(2847.)
0.0( 0.0) 0.0( 0.0)
8968 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8969 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8970 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8971 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
8972 2073 43.4( 10.0: 20.0) 56.6(1173.)
0.0( 0.0) 0.0( 0.0)
I would think with the parameters mentioned above enabled in my
maui.cfg that there should be some kind of value listed in the "Res
(Cons )" column adding to a job's priority. If this were happening,
then suspended jobs, which have already consumed CPU time, should
acquire additional priority points at a higher rate than idle jobs
that were submitted at the same time, thereby meaning they would
(hopefully) be resumed before an idle job in the queue was started. I
have not found this to be the case.
Anyone info on how to solve this would be MUCH appreciated...
Just for kicks... here's my entire maui.cfg:
SERVERHOST node001.cluster
SERVERPORT 42559
SERVERMODE NORMAL
ADMIN1 root
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
RMCFG[base] TYPE=PBS TIMEOUT=90
RMPOLLINTERVAL 00:00:10
BACKFILLPOLICY FIRSTFIT
NODEALLOCATIONPOLICY MINRESOURCE
NODEACCESSPOLICY SHARED
PREEMPTPOLICY SUSPEND
RESERVATIONPOLICY NEVER
FSPOLICY UTILIZEDPS
DEFERTIME 1:00
DEFERCOUNT 999
DEFERSTARTCOUNT 10
CREDWEIGHT 5
CLASSWEIGHT 8
QOSWEIGHT 2
QUEUETIMEWEIGHT 1
TARGETQUEUETIMEWEIGHT 1
CONSUMEDWEIGHT 3
QOSCFG[lopri] PRIORITY=10 QFLAGS=PREEMPTEE FLAGS=PREEMPTEE
JOBFLAGS=PREEMPTEE
QOSCFG[hipri] PRIORITY=10000 QFLAGS=PREEMPTOR FLAGS=PREEMPTOR
JOBFLAGS=PREEMPTOR
CLASSCFG[long-loprio] QDEF=lopri MAXMEM=1200 MAXJOBPERUSER=30
CLASSCFG[long] QDEF=lopri MAXMEM=1200
CLASSCFG[short] QDEF=lopri
CLASSCFG[interact] QDEF=hipri
CLASSCFG[swbuild] QDEF=hipri
NODECFG[DEFAULT] MAXJOB=5 MAXLOAD=3
USERCFG[DEFAULT] QTTARGET=0:00:01 QLIST=lopri,hipri
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers