On Fri, Aug 31, 2007 at 10:18:58AM +0330, Farid Behnia wrote: > Hi, > > I've put together a simple 2-node cluster using Debian etch , OpenMPI , FAI > & Cfengine. > I'm looking for ideas that can help me with building a better self-healing > cluster. Right now I'm making rule files for cfengine and would acknowledge > any input on sample files and important configurations that need to be made > for the cluster's health. (Although it's site-specific but I'm sure I can > get good hints out of them) > > However I'd also be glad to see if you have any monitoring system in mind > that can cooperate with cfengine in the maintenance job. I've looked briefly > into Ganglia and Nagios so far. It seems Ganglia is mostly meant for large > (groups of) clusters and focuses on hw resources. Nagios seems to be > better-suited for my job, but the gurus at cfengine mailing list believe > that cfenvd & cfexecd can provide equal monitoring & recovery capability (in > terms of response time). > What's your take on either of them? > > Thanks beforehand to anyone sharing their experience.
Although it's not exactly FAI related, you might have a look at Gluster: http://www.gluster.org Steffen -- Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/ * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298} No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html
