Hi Bill, You wrote: > Running maui/torque on a number of clusters. Have never used the > preemtion stuff before but upon searching through the documentation what > I'd like to do might not be able to be done here. > > I'd like 4 classes/queues. A preempts B prempts C and D preempts C. > > C is the general purpose totally premptable class. Both D and B can get > their bits they need by preempting jobs in this class. And I believe > that this would be all fine and good except for this A class which needs > full access to the machine (less D class jobs) NOW! > > > I've looked through the Maui docs as well as the moab ones to no avail. > Preliminary searching leads me to SGE and LSF. But it sure would be > nice to make this something maui could handle. > > Anyone know of a way or something I may be missing? I guess I could > kill jobs out of a prologue script for A class jobs until I freed enough > cores. I don't know.
If I understand your problem correctly, you can solve it like this: # Short defer times, but allow many deferrals DEFERSTARTCOUNT 3 DEFERTIME 0:00:50 DEFERCOUNT 500 # Put priority on QOS only. # The zeroes are there, so b jobs do not preempt b jobs, # and so c jobs do not preempt c jobs. QOSWEIGHT 1 QUEUETIMEWEIGHT 0 XFACTORWEIGHT 0 # Requeue jobs. There are other policies, but I have not tried them PREEMPTIONPOLICY REQUEUE # Define the queues CLASSCFG[a] QDEF=acos CLASSCFG[b] QDEF=bcos CLASSCFG[c] QDEF=ccos CLASSCFG[d] QDEF=dcos QOSCFG[acos] PRIORITY=500000 QFLAGS=PREEMPTOR QOSCFG[bcos] PRIORITY=300000 QFLAGS=PREEMPTOR,PREEMPTEE QOSCFG[ccos] PRIORITY=100000 QFLAGS=PREEMPTOR,PREEMPTEE QOSCFG[dcos] PRIORITY=200000 QFLAGS=PREEMPTOR The solution is not perfect, because of a Maui bug: When Maui has sent the Requeue order to Torque, it does not wait for requeue completion and immediately asks for the new job to run on the nodes that perhaps not yet are evacuated. (Moab has the same problem, but tries to go around it by retrying the job start many times.) As far as I know, CRI has not yet fixed the bug. Myself, I have fixed it in a straightforward way, by putting in a sleep statement after the Requeue request. Other members of this list may have more beautiful fixes... Best regards, -- Lennart Karlsson <[email protected]> National Supercomputer Centre in Linkoping, Sweden http://www.nsc.liu.se _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
