Hi Arnau, Jason Well, I guess I should consider myself happy to administer only small clusters. :)
Now, how about the [terse] guidance in the Maui Admin Guide for large clusters? http://www.adaptivecomputing.com/resources/docs/maui/a.ilargeclusters.php And the [slightly more verbose] one for Torque: http://www.adaptivecomputing.com/resources/docs/torque/a.flargeclusters.php Would them help with scalability? Cheers, Gus Correa Jason Williams wrote: > I've noticed similar things when my cluster gets loaded too. I find it > annoying that if maui gets behind, and "misses" scheduler iterations, > because it's working on high job turn around, it has to catch up on the > missed iterations. Also, while maui is scheduling things, there is what > appears to be a type of global "lock" or block on all communications to > maui. So if you get very busy, and start missing many iterations, it > can sometimes be over 30 minutes to over an hour before maui starts > responding again. To users, this may look like a deadlock, but really, > when you look at the logs, maui is just going nuts trying to catch up. > > I've been meaning to look at the code to figure out what the heck is > going on, but I haven't had time. > > Basically, that's my long winded way of saying "I have seen this too, > Arnau." And that I don't really have a good way around it aside from > setting limitations as another member suggested. > > -- > Jason Williams > Sr. Systems Administrator > Homewood HPC Cluster > Johns Hopkins University > > On 9/28/2011 10:40 AM, Arnau Bria wrote: >> Hi all, >> >> we've been using torque/maui for a long time. Our initial cluster was >> about 50 nodes and now ~350 with 3k processors. >> >> It has been working fine since last cluster upgrade, when we added >> last 500 processors. Since then, maui client commands hang and we had >> to increase poll interval cause scheduling cycle took too much... Now, >> with a system with 3k running jobs and 3k in queue, we're facing more >> maui issues... >> >> So, we were wondering which are maui limits, if we have reached any of >> them and if anyone who already reached our limits could share his >> experience, on solving them, with us. >> >> we're running maui-3.3-1.x86_64. >> >> >> Many thanks in advance, >> Cheers, >> Arnau >> _______________________________________________ >> mauiusers mailing list >> [email protected] >> http://www.supercluster.org/mailman/listinfo/mauiusers > > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
