On Wed, 7 Nov 2007, Douglas Nordwall wrote:
> Do any of you who have very large clusters find that ganglia fills the bill > for your cluster? Check out http://www.rocksclusters.org/rocks-register/index.php?sortby=CPUs&sortorder=down it shows a list, ordered by CPU amount, of cluster using Rocks / Ganglia. My personal experience shows that cluster with 2K nodes can easily be handled by Ganglia, if properly configured. > Do you notice a performance hit from it? The resource utilization is minimal from my experience. For large cluster I recommend to make use of Ganglia's grid model to enhance scalability by metric aggregation. > How much tuning > have you done with it? Actually, not much. Once I had designed the grid layout with scalability being the major design force, Ganglia pretty much worked out of the box. I did some tuning with RRD, though. Essentially I moved the RRD records to /dev/shm. > Do you wish it was better in some fashion, and how so? It does a really good job for me. However, there is always room for improvement, as the recent 3.1.0 wishlist demonstrates :-) > I'm looking for experience with very large clusters here, since we already > have a number of clusters in the 32-256 node range that we use ganglia on. > > For those of you with even more detailed knowledge, and perhaps the time to > read a paper, have to figured out if it introduces jitter into your cluster, > and if so, how to avoid it? > > relevant paper: > http://www.sc-conference.org/sc2003/paperpdfs/pap301.pdf > > The short of it is that for tightly coupled jobs, the overhead of one job can > affect all the nodes involved in the job. One node has to spend 1ms dealing > with some overhead on the system, and all the nodes end up spending that 1ms > waiting for him to get done. > > Doug Nordwall > Unix Administrator > EMSL Computer and Network Support > Unclassified Computer Security > Phone: (509)372-6776; Fax: (509)376-0420 > The best book on programming for the layman is "Alice in Wonderland"; but > that's because it's the best book on anything for the layman. > > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

