Re: [Nagios-users] lun monitoring
Russell Adams wrote: On Mon, Feb 02, 2009 at 07:10:45PM -0700, Tom Ammon wrote: Russell, Cacti is pretty SNMP-centric, but in our environment that is about all we are using it for anyway. I'm no cacti expert, but to me, that's the beauty of it - I don't really know the inner workings of cacti, and I am not a programmer or scripter, but I got it up and running pretty quickly. SNMP is a great place to start, and very open. Its certainly more reliable than the CIMOM implementations I see. I'm not sure if you would call it autodiscovery, but cacti does do an snmpwalk on the devices that you specify, and the pre-built data collection methods that come with it are designed for getting snmp interface statistics. You can, of course, add other data collection methods, but out of the box, it is basically an interface traffic grapher. You still have to manually input each device that you will collect data for. Once you have specified the basic host information, it gives you a table showing all of the interfaces on that device and a checkbox for each item that can be graphed. Torrus is configured by feeding it a list of IP addresses and it identifies the device and sets up all the counters to be monitored. The detail is very good, more than just interface stats. To be fair, though (and this applies to nagios as well as cacti) most of the effort you put in to setting up a monitoring solution is a one-time thing. It takes time to input all of the devices, but for the most part once the devices to be monitored are specified, that work is over. I think people incorrectly place a lot of emphasis on this or that product's autodiscovery function. Cacti's interface makes it really easy to maintain the configuration, and I think that is a bigger win than autodiscovery. I consider autodiscovery to be absolutely critical. Maintaining a handfull of machines is one thing, hundreds or thousands or machines outside of your control are another. I wrote NACE to allow me to perform fast autodiscovery for Nagios, and I've been pleased to couple it with Torrus so they both have the same list of hosts. That is probably where our differing environments cause us to need different things. In my environment I monitor hundreds, but not thousands of devices. And they are all in my control. If I worked for a large ISP, I'm sure I would see things differently. With Torrus, on a router, for example, what kind of detail would it typically give you outside of the normal interface statistics? Would it be able to discern cpu usage, memory usage, etc. without you specifying some kind of template for it to use as a reference? Cacti has sort of solved this with their data templates. For example, there is a Unix Host Template that you can download and then apply to a device, and it gives you all of the parameters that are built in that template, for example, cpu/mem/disk. But the author of the template had to know the OIDs (and use the correct OIDs). It wasn't really autodiscovered. Tom -- - Tom Ammon Network Engineer Mobile: 801.674.9273 Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] lun monitoring
James Pratt wrote: -Original Message- From: Russell Adams [mailto:rlad...@adamsinfoserv.com] Sent: Monday, February 02, 2009 6:38 PM To: nagios-users@lists.sourceforge.net Subject: Re: [Nagios-users] lun monitoring Marc, In the spirit that each tool is supposed to fill one function and do it well, I don't use Nagios for trending. Nagios is operational status monitoring only. I'd suggest you look at other tools for that level of performance. One issue you will have is where will you query it? On certain OS's you can query disk statistics, or you may be able to get the data from the backend storage, or perhaps an aggregate from the SAN switch. I am not aware of any integrated solutions except those high dollar packages sold by storage vendors (ala TPC). Good luck. I would agree with Russell - Cacti RRDtool (free/open-source) works great for graphing/trending just about anything - free/open-source too... :) Jamie I'll give a strong second to that - we use Cacti to graph 10,000+ data sources, and it works great. It's a strong tool. Tom -- - Tom Ammon Network Engineer Mobile: 801.674.9273 Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] lun monitoring
Russell Adams wrote: On Mon, Feb 02, 2009 at 05:22:12PM -0700, Tom Ammon wrote: I'll give a strong second to that - we use Cacti to graph 10,000+ data sources, and it works great. It's a strong tool. Tom Tom, I have progressed through MRTG, Cricket, and now Torrus in my search for a good trending tool. They all use RRDTool because its simply the best at time series data, the differences are the frontend. MRTG was the basic model, required complete manual configuration. Cricket was better, more web layout and a little less configuration. Torrus is what I've settled on. The autodiscovery feature was the selling point. Cacti's web UI is nicer, but I love the autodiscovery. Discovery is fairly easy to customize in XML and Perl. What has your experience with Cacti been? Do they have good autodiscovery now? How is support for adding new device types? Thanks. Russell, Cacti is pretty SNMP-centric, but in our environment that is about all we are using it for anyway. I'm no cacti expert, but to me, that's the beauty of it - I don't really know the inner workings of cacti, and I am not a programmer or scripter, but I got it up and running pretty quickly. I'm not sure if you would call it autodiscovery, but cacti does do an snmpwalk on the devices that you specify, and the pre-built data collection methods that come with it are designed for getting snmp interface statistics. You can, of course, add other data collection methods, but out of the box, it is basically an interface traffic grapher. You still have to manually input each device that you will collect data for. Once you have specified the basic host information, it gives you a table showing all of the interfaces on that device and a checkbox for each item that can be graphed. To be fair, though (and this applies to nagios as well as cacti) most of the effort you put in to setting up a monitoring solution is a one-time thing. It takes time to input all of the devices, but for the most part once the devices to be monitored are specified, that work is over. I think people incorrectly place a lot of emphasis on this or that product's autodiscovery function. Cacti's interface makes it really easy to maintain the configuration, and I think that is a bigger win than autodiscovery. What do you mean by new device types? Tom -- - Tom Ammon Network Engineer Mobile: 801.674.9273 Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu -- Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM) software. With Adobe AIR, Ajax developers can use existing skills and code to build responsive, highly engaging applications that combine the power of local resources and data with the reach of the web. Download the Adobe AIR SDK and Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Status Map
I have used Nagvis extensively in our environment and it does a good job of visualizing nagios status data. I highly recommend it. To make it look really nice, however, you will need to create your own icon sets, which can take some time. The included icon sets work, but I found that they didn't meet our needs. Tom Doug Veldhuisen wrote: NagVis is supposed to be one options to do this, Not had enough time to check it out myself. I am currently monitoring a couple of hundred devices and my standard map looks terrible. Doug *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] *On Behalf Of *Charles Breite *Sent:* Thursday, August 14, 2008 1:46 PM *To:* nagios-users@lists.sourceforge.net *Subject:* [Nagios-users] Nagios Status Map I have started adding icons to our Nagios status map first by adding parents and then by adding hostext info. But the icons on the default (circular markup)status map still overlap and are unreadable. I am slowly adding user defined coordinates but would like the auto matic circular markup map to look good also. Does anyone know of a way to make sure they cant auto overlap themselves? Thanks - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Freshness checking and a distributed Nagios system.
Nagios forces active checks to be run when used in conjuction with freshness checking, even when active checks for that service are disabled. The docs describe it pretty well at http://nagios.sourceforge.net/docs/2_0/distributed.html under the Freshness Checking section. You have to read it carefully, though. Tom Jonathan Call wrote: Correct me if I'm wrong: In order to run a distributed system, the central server should have active service checks disabled. But freshness checking executes the check command when it doesn't receive a passive response in a timely manner. This means the freshness check never runs. How do you get around that? This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free. Thank you. - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] [Fwd: distributed monitoring host checking question]
So, no thoughts on this question? Original Message Subject:distributed monitoring host checking question Date: Wed, 30 Jul 2008 01:05:21 -0600 From: Tom Ammon [EMAIL PROTECTED] To: nagios-users@lists.sourceforge.net Hi, I am working on setting up a distributed monitoring system with Nagios (actually Groundwork). I have 3 child servers and 1 parent server, using NSCA to send passive check results from the children to the parent server. My question is about how Nagios (version 2.5) will behave when an on demand host check needs to be run. So for example: Host A is configured with check_host_alive ( a simple ping ) as its host check command on the parent server. It is also configured with Service A, say an SNMP check. Active host checks are not disabled on the parent server, but active service checks are. Host A, obviously, is also configured on the child server. When the child server sends a passive check result up to the parent saying that the SNMP check has failed, will the parent server then run the on-demand host check command to verify that Host A is still up? If not, how do I get that information up to the parent? Are passive host checks my only option? So I guess the question is this: In a distributed monitoring setup, will a parent server run an on-demand host check for a host that gets a report (via a passive service check sent from a child server) of a service being critical? Thanks, Tom -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] [Fwd: distributed monitoring host checking question]
I did, indeed, read the docs. However, the link you posted (or any of the other documentation I found) doesn't answer my question about the relationships between active host checks and passive service checks. I post to lists because maybe the discussion can help someone else at a later time. Sure, I could try it and just figure it out on my own, but what about the community? Aren't we supposed to be helping each other here? Sean, thanks for your help. If I see any different behavior with nagios 2.5, I'll post it back to the list. Tom Marc Powell wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Tom Ammon Sent: Thursday, July 31, 2008 11:25 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] [Fwd: distributed monitoring host checking question] So, no thoughts on this question? Documentation- http://nagios.sourceforge.net/docs/2_0/passivechecks.html or take 10 minutes to just test it? -- Marc - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] distributed monitoring host checking question
Hi, I am working on setting up a distributed monitoring system with Nagios (actually Groundwork). I have 3 child servers and 1 parent server, using NSCA to send passive check results from the children to the parent server. My question is about how Nagios (version 2.5) will behave when an on demand host check needs to be run. So for example: Host A is configured with check_host_alive ( a simple ping ) as its host check command on the parent server. It is also configured with Service A, say an SNMP check. Active host checks are not disabled on the parent server, but active service checks are. Host A, obviously, is also configured on the child server. When the child server sends a passive check result up to the parent saying that the SNMP check has failed, will the parent server then run the on-demand host check command to verify that Host A is still up? If not, how do I get that information up to the parent? Are passive host checks my only option? So I guess the question is this: In a distributed monitoring setup, will a parent server run an on-demand host check for a host that gets a report (via a passive service check sent from a child server) of a service being critical? Thanks, Tom -- - Tom Ammon Network Engineer Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg Center for High Performance Computing University of Utah http://www.chpc.utah.edu - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null