On Tue, Aug 08, 2006 at 10:39:59AM -0700, Sam Rash alleged: > Ooh, I may have missed something: we regularly hit maui with 5k jobs > daily--the default for MMAX_JOB is 4096. What does this actually mean? > Only 4096 will be considered by maui at a time? (ie, left in the RM)
Correct. Any jobs after the max are simply ignored. When you think about it, since 4096 jobs can't actually run (since you don't actually have that many nodes), there isn't much need for maui to read in more jobs. When I came across this problem on my own cluster, I found that the "bad user" would always pass any max jobs that I built into maui. A strategy to deal with this is to use routing queues in TORQUE... set server default_queue = default create queue default queue_type=R,route_destinations=mainexec create queue mainexec queue_type=E,max_queuable=1000 I have a fairly deeply nested set of routing queues for different groups of users, each with different max resources, acls, max_queuables, and max_user_runs. The idea is to prevent a user in one group to swamp maui and prevent other queues from executing. _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
