We are running Maui-3.2.6p14 along with Torque-2.1.8. with over 100 nodes.
We have the following "Standing Reservation" configured:
SRCFG[development] STARTTIME=8:00:00 ENDTIME=18:00:00
SRCFG[development] ACCESS=DEDICATED
SRCFG[development] NODEFEATURES=prod
SRCFG[development] PERIOD=DAY DAYS=MON,TUE,WED,THU,FRI
SRCFG[development] PRIORITY=200
SRCFG[development] TASKCOUNT=14
SRCFG[development] MAXTIME=2:00:00
Sometimes we also need to reserve the nodes for maintenance. For this we
normally use Administrative Reservations, configured for eg by:
setres -u root -s 9:00_06/25 -d 3:00 ALL
On occasion these "Administrative Reservations" just disappear (now it is
happening repeatedly.)
After perusing the log files it appears that the disappearance of
the Administrative Reservations occur when Maui decides "re-shuffle" the
Standing Reservations. I have included a "snapshot" of the appropriate portion
of the log files at the end of this message.
From looking at the log files it's almost as if the "Standing Reservations"
preempt the existing "Administrative Reservation" (see * in the maui.log file.)
This happens even for those Administrative Reservations which are not to occur
for days.
Has anyone seen this before?
(I am not sure if this has anything to do with the disappearance but I also
noticed that the administrative reservations are configured with
Flags set as "PREEMTEE" and I can't seem to change this.
I have lowered the priority of the standing reservation from 200 to 0
to see if this helps)
maui.log:
06/21 11:10:01 WARNING: job 'development.0' has NULL cred list
06/21 11:10:01 INFO: adequate tasks found for all reqs (time 00:00:00)
06/21 11:10:01 MJobNLDistribute(development.0,SrcMNL,DstMNL)
06/21 11:10:01 INFO: tasks found for job development.0 (tasks requested:
14)
06/21 11:10:01
MJobAllocMNL(development.0,MFeasibleList,NodeMap,MOutList,MINRESOURCE,1182449401)
06/21 11:10:01 INFO: tasks located for job development.0: 3 of 3
required (3 feasible)
06/21 11:10:01 INFO: allocated MNode[000]x1 'node004' to development.0:0
06/21 11:10:01 INFO: allocated MNode[001]x1 'node002' to development.0:0
06/21 11:10:01 INFO: allocated MNode[002]x1 'node001' to development.0:0
06/21 11:10:01 MResDestroy(development.0.0)
06/21 11:10:01 MResChargeAllocation(development.0.0,2)
06/21 11:10:01 MSysRegEvent(RESERVATIONDESTROYED: development.0.0 User
1182449401 1182449385 1182474000 0 ,0,0,1)
06/21 11:10:01 MSysLaunchAction(ASList,1)
06/21 11:10:01
MResCreate(User,ACL,NULL,514,NodeList,1182449401,1182474000,3,0,development.0,ResP,'',DRes)
06/21 11:10:01 WARNING: partial standing reservation development reserved
12 of 56 procs in partition '[ALL]' to start in 00:00:00 at (1182449401) Thu
Jun 21 11:10:01
* 06/21 11:10:01 MResPreempt(development.0.0)
* 06/21 11:10:01 MResDestroy(root.0)
* 06/21 11:10:01 MResChargeAllocation(root.0,2)
* 06/21 11:10:01 MSysRegEvent(RESERVATIONDESTROYED: root.0 User 1182449401
1182787200 1182794400 0 ,0,0,1)
06/21 11:10:01 MSysLaunchAction(ASList,1)
06/21 11:10:01 MSRSetRes(development,0,0)
06/21 11:10:01 MJobSetCreds(development.0,[ALL],[ALL],[ALL])
06/21 11:10:01 MSRGetAttributes(development,0,Start,Duration)
06/21 11:10:01 INFO: attempting standing reservation of 56 procs in
00:00:00 for 6:49:59
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers