Hello everyone,
I'm trying to setup some basic preemtption with a "suspend" policy whithin
Maui. The preemption part is working, except that the job that gets
preempted (suspended) doesn't restart execution until after all other jobs
in the Idle queue are finished executing, even if those jobs don't have the
preemtor flag set, and as far as I can tell, those jobs don't have a higher
priority nor xfactor than the suspended job either.
By looking at the logs, it seems to me that while the first job was
suspended, and the preemptor was running, the next idle job in the queue
(with same prioriy as the suspended one), was reserved the node next
somehow, and so when the suspended job is supposed to restart, it doesn't
find an available node.
I would appreciate any hints in this regard.
Thanks.
David
1. Background
2. Relevant maui.log info
3. maui.cfg
***Background****
- Simple test (1 master and 1 node)
- Master is not in the execution loop (no pbs_mom)
- Node has 4 processors, and all jobs require 4 processors
- Job 38 Preemptor (fast queue)
- Jobs 30 and 31 are Preemtees (long queue)
- Job 30 was started and it was in execution when
job 38 was submitted and preempted the running job.
When job 38 finished, job 30 (which was suspended),
should have restarted execution, and job 31 should
wait on the idle queue. But instead, job 31 was
scheduled to start and it preempted job 30, so,
job 30 remains in suspended mode.
****Relevant maui.log info.*****
#### Job 38 just finished
INFO: active PBS job 38 has been removed from the queue. assuming
successful completion
MJobProcessCompleted(38)
.
.
INFO: job usage sent for job '38'
MJobRemove(38)
MResDestroy(38)
MResChargeAllocation(38,2)
MJobDestroy(38)
#### Job 30 had been preempted by 38 and so it's in suspend mode
#### but it should run now that 38 finished and the rest of
#### of the jobs in the queue are not preemptors.
MClusterUpdateNodeState()
MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
INFO: job '30' Priority: 17
INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv:
17(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.
INFO: job '31' Priority: 17
INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv:
17(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.
MStatClearUsage([NONE],Idle)
INFO: total jobs selected (ALL): 6/6
MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
INFO: job '30' Priority: 17
INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv:
17(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
INFO: job '31' Priority: 17
INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv:
17(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
#### Why does job 30 not have adequate tasks or nodes found if the
#### node is free ?, and why does that same node get assigned to
#### job 31 ??
MStatClearUsage([NONE],Idle)
INFO: total jobs selected (ALL): 6/6
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
INFO: total jobs selected in partition ALL: 6/6
INFO: 4 feasible tasks found for job 30:0 in partition DEFAULT (4 Needed)
INFO: inadequate feasible tasks found for job 30:0 (0 < 4)
INFO: inadequate nodes found for job 30:0 (0 < 1)
MQueueScheduleRJobs(Q)
MResDestroy(31)
MResChargeAllocation(31,2)
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
INFO: total jobs selected in partition ALL: 6/6
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
INFO: total jobs selected in partition DEFAULT: 6/6
MQueueScheduleIJobs(Q,DEFAULT)
INFO: 4 feasible tasks found for job 31:0 in partition DEFAULT (4 Needed)
INFO: tasks located for job 31: 4 of 4 required (4 feasible)
MJobStart(31)
***** MAUI.CFG ****
PREEMPTPOLICY SUSPEND
QUEUETIMEWEIGHT 1
CREDWEIGHT 1
USERWEIGHT 1
GROUPWEIGHT 1
XFACTORWEIGHT 1
QOSWEIGHT 1
JOBPRIOACCRUALPOLICY FULLPOLICY
XFACTORCAP 10000
XFMINWCLIMIT 0:01:00
CLASSCFG[long] QDEF=long
CLASSCFG[fast] QDEF=fast
QOSCFG[long] QFLAGS=PREEMPTEE PRIORITY=10
QOSCFG[fast] QFLAGS=PREEMPTOR PRIORITY=1000
NODEALLOCATIONPOLICY MINRESOURCE
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers