Do any of you who have very large clusters find that ganglia fills the bill for your cluster? Do you notice a performance hit from it? How much tuning have you done with it? Do you wish it was better in some fashion, and how so? I'm looking for experience with very large clusters here, since we already have a number of clusters in the 32-256 node range that we use ganglia on.

For those of you with even more detailed knowledge, and perhaps the time to read a paper, have to figured out if it introduces jitter into your cluster, and if so, how to avoid it?

relevant paper:
http://www.sc-conference.org/sc2003/paperpdfs/pap301.pdf

The short of it is that for tightly coupled jobs, the overhead of one job can affect all the nodes involved in the job. One node has to spend 1ms dealing with some overhead on the system, and all the nodes end up spending that 1ms waiting for him to get done.

Doug Nordwall
Unix Administrator
EMSL Computer and Network Support
Unclassified Computer Security
Phone: (509)372-6776; Fax: (509)376-0420
The best book on programming for the layman is "Alice in Wonderland"; but that's because it's the best book on anything for the layman.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to