Hi, > we've been using torque/maui for a long time. Our initial cluster was > about 50 nodes and now ~350 with 3k processors. > > It has been working fine since last cluster upgrade, when we added > last 500 processors. Since then, maui client commands hang and we had > to increase poll interval cause scheduling cycle took too much... Now, > with a system with 3k running jobs and 3k in queue, we're facing more > maui issues... > > So, we were wondering which are maui limits, if we have reached any of > them and if anyone who already reached our limits could share his > experience, on solving them, with us. > > we're running maui-3.3-1.x86_64.
I would advise defining a limit on idle jobs per user. For example: USERCFG[DEFAULT] MAXIJOB=200 or any suitable number for you site. Alternatively, Torque has a per-queue max_user_queuable setting, but it counts both running and queued jobs. If you use a route queue to route your job to an execution queue, you can define this for the execution queue and jobs will be moved to the execution queue only when the limit is respected. Both solutions should decrease the load on Maui as it does not need to schedule as many jobs at a time. -- Michel Béland, analyste en calcul scientifique [email protected] bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal téléphone : 514 343-6111 poste 3892 télécopieur : 514 343-2155 Calcul Québec (www.calculquebec.ca) Calcul Canada (calculcanada.org) _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
