On Wed, 7 Nov 2007, Douglas Nordwall wrote:

> Do any of you who have very large clusters find that ganglia fills the bill
> for your cluster?

Check out 
http://www.rocksclusters.org/rocks-register/index.php?sortby=CPUs&sortorder=down

it shows a list, ordered by CPU amount, of cluster using Rocks / 
Ganglia.  My personal experience shows that cluster with 2K nodes can easily be 
handled by Ganglia, if 
properly configured.

> Do you notice a performance hit from it? 

The resource utilization is minimal from my experience.  For large cluster 
I recommend to make use of Ganglia's grid model to enhance scalability by
metric aggregation.

> How much tuning
> have you done with it? 

Actually, not much.  Once I had designed the grid layout with scalability 
being the major design force, Ganglia pretty much worked out of the box.  
I did some tuning with RRD, though.  Essentially I moved the RRD records
to /dev/shm.

> Do you wish it was better in some fashion, and how so?

It does a really good job for me.  However, there is always room for 
improvement, as the recent 3.1.0 wishlist demonstrates :-)

> I'm looking for experience with very large clusters here, since we already
> have a number of clusters in the 32-256 node range that we use ganglia on.
> 
> For those of you with even more detailed knowledge, and perhaps the time to
> read a paper, have to figured out if it introduces jitter into your cluster,
> and if so, how to avoid it?
> 
> relevant paper:
> http://www.sc-conference.org/sc2003/paperpdfs/pap301.pdf
> 
> The short of it is that for tightly coupled jobs, the overhead of one job can
> affect all the nodes involved in the job. One node has to spend 1ms dealing
> with some overhead on the system, and all the nodes end up spending that 1ms
> waiting for him to get done.
> 
> Doug Nordwall
> Unix Administrator
> EMSL Computer and Network Support
> Unclassified Computer Security
> Phone: (509)372-6776; Fax: (509)376-0420
> The best book on programming for the layman is "Alice in Wonderland"; but
> that's because it's the best book on anything for the layman.
> 
> 
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to