Hello List, Been a while since I have been able to post on a regular basis due to being given the opportunity to seek other employment! I have landed, gracefully, in a position where I have been tasked with designing a large scale Nagios installation. The requirements are all client requirements and pretty necessary. I need a little advice on where to start. I will describe the environment and then lay out my idea on how I see the design coming together.
The environment is an HPCC environment and the requirements are based on that aspect almost exclusively. Initially the monitoring will be for clustering only and then expanding out to other servers outside of the HPCC environment. 1. No client installed on compute nodes (there is an HA head node where a client or a full install could be done). 2. No active checks directly to compute nodes 3. Ganglia is available for node data That is the majority of the requirements. Ganglia makes things a bit easier but I am not sure how much easier. It looks like GroundWork could handle this but I don't see the large scale features available in the open source version. The environment is as follows 1. 80 clusters 2. Each cluster has 70-72 compute nodes The client wants a single point of monitoring for this environment. I am looking at the following for a setup: Using the ganglia plugin from Nagios Exchange to gather and parse the data, on the HA head node, and having this report back to a main Nagios server (HA) for the single point of monitoring. What I don't know is how Nagios 3.x will scale with ~5000-6000 hosts coming into a single point of monitoring. What cannot happen is the checks causing any degradation in the HPCC environment. Ganglia is already in place and accounted for in performance so querying the ganglia process is allowed but they would prefer to pull this data from gmetad and not gmond. Also, from the Nagios management side, I would like to see if there is a way to automatically add hosts if a new host pops up in the ganglia data. This is not a deal breaker but will make life so much easier in the long run. I have likely not given enough information somewhere but I think there is enough here to get a discussion started. It's good to be back! Regards, Mark L. Potter eXcellence in IS Solutions, Inc. (X-ISS) Office: 713-862-9200 x219 Email : [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> http://www.x-iss.com <http://www.x-iss.com/> Making IT Work for You HPC & Enterprise IT Solutions * HPC Application Acceleration * Cluster Design, Deploy, Manage, Train * Linux/Windows Integration * Remote Management, Backup, Anti-Spam/Virus * Network Assessments, Design * Security Audits, Design * Datacenter Design, Relocation * Messaging and Collaboration NOTICE: This message may contain privileged or otherwise confidential information. If you are not the intended recipient, please immediately advise the sender by reply email and delete the message and any attachments without using, copying or disclosing the contents. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null