Re: [Nagios-users] lun monitoring

2009-02-03 Thread Tom Ammon


Russell Adams wrote:
 On Mon, Feb 02, 2009 at 07:10:45PM -0700, Tom Ammon wrote:
   
 Russell,

 Cacti is pretty SNMP-centric, but in our environment that is about all 
 we are using it for anyway. I'm no cacti expert, but to me, that's the 
 beauty of it - I don't really know the inner workings of cacti, and I am 
 not a programmer or scripter, but I got it up and running pretty quickly.
 

 SNMP is a great place to start, and very open. Its certainly more
 reliable than the CIMOM implementations I see.

   
 I'm not sure if you would call it autodiscovery, but cacti does do an 
 snmpwalk on the devices that you specify, and the pre-built data 
 collection methods that come with it are designed for getting snmp 
 interface statistics. You can, of course, add other data collection 
 methods, but out of the box, it is basically an interface traffic 
 grapher. You still have to manually input each device that you will 
 collect data for. Once you have specified the basic host information, it 
 gives you a table showing all of the interfaces on that device and a 
 checkbox for each item that can be graphed.
 

 Torrus is configured by feeding it a list of IP addresses and it
 identifies the device and sets up all the counters to be
 monitored. The detail is very good, more than just interface stats.

   
 To be fair, though (and this applies to nagios as well as cacti) most of 
 the effort you put in to setting up a monitoring solution is a one-time 
 thing. It takes time to input all of the devices, but for the most part 
 once the devices to be monitored are specified, that work is over. I 
 think people incorrectly place a lot of emphasis on this or that 
 product's autodiscovery function. Cacti's interface makes it really easy 
 to maintain the configuration, and I think that is a bigger win than 
 autodiscovery.
 

 I consider autodiscovery to be absolutely critical. Maintaining a
 handfull of machines is one thing, hundreds or thousands or machines
 outside of your control are another. I wrote NACE to allow me to
 perform fast autodiscovery for Nagios, and I've been pleased to couple
 it with Torrus so they both have the same list of hosts.
   
That is probably where our differing environments cause us to need 
different things. In my environment I monitor hundreds, but not 
thousands of devices. And they are all in my control. If I worked for a 
large ISP, I'm sure I would see things differently.

With Torrus, on a router, for example, what kind of detail would it 
typically give you outside of the normal interface statistics? Would it 
be able to discern cpu usage, memory usage, etc. without you specifying 
some kind of template for it to use as a reference?

Cacti has sort of solved this with their data templates. For example, 
there is a Unix Host Template that you can download and then apply to a 
device, and it gives you all of the parameters that are built in that 
template, for example, cpu/mem/disk. But the author of the template had 
to know the OIDs (and use the correct OIDs). It wasn't really 
autodiscovered.

Tom


-- 
-
Tom Ammon
Network Engineer
Mobile: 801.674.9273

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


--
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] lun monitoring

2009-02-02 Thread Tom Ammon


James Pratt wrote:
 -Original Message-
 From: Russell Adams [mailto:rlad...@adamsinfoserv.com]
 Sent: Monday, February 02, 2009 6:38 PM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] lun monitoring

 Marc,

 In the spirit that each tool is supposed to fill one function and do
 it well, I don't use Nagios for trending. Nagios is operational status
 monitoring only.

 I'd suggest you look at other tools for that level of performance. One
 issue you will have is where will you query it? On certain OS's you
 
 can
   
 query disk statistics, or you may be able to get the data from the
 backend storage, or perhaps an aggregate from the SAN switch.

 I am not aware of any integrated solutions except those high dollar
 packages sold by storage vendors (ala TPC).

 Good luck.

 

 I would agree with Russell - Cacti  RRDtool (free/open-source) works
 great for graphing/trending just about anything - free/open-source
 too... :)

 Jamie
   

I'll give a strong second to that - we use Cacti to graph 10,000+ data 
sources, and it works great. It's a strong tool.

Tom

-- 
-
Tom Ammon
Network Engineer
Mobile: 801.674.9273

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


--
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] lun monitoring

2009-02-02 Thread Tom Ammon


Russell Adams wrote:
 On Mon, Feb 02, 2009 at 05:22:12PM -0700, Tom Ammon wrote:
   
 I'll give a strong second to that - we use Cacti to graph 10,000+ data 
 sources, and it works great. It's a strong tool.

 Tom
 

 Tom,

 I have progressed through MRTG, Cricket, and now Torrus in my search
 for a good trending tool. They all use RRDTool because its simply the
 best at time series data, the differences are the frontend.

 MRTG was the basic model, required complete manual configuration.

 Cricket was better, more web layout and a little less configuration.

 Torrus is what I've settled on. The autodiscovery feature was the
 selling point. Cacti's web UI is nicer, but I love the
 autodiscovery. Discovery is fairly easy to customize in XML and Perl.

 What has your experience with Cacti been? Do they have good
 autodiscovery now? How is support for adding new device types?

 Thanks.


Russell,

Cacti is pretty SNMP-centric, but in our environment that is about all 
we are using it for anyway. I'm no cacti expert, but to me, that's the 
beauty of it - I don't really know the inner workings of cacti, and I am 
not a programmer or scripter, but I got it up and running pretty quickly.

I'm not sure if you would call it autodiscovery, but cacti does do an 
snmpwalk on the devices that you specify, and the pre-built data 
collection methods that come with it are designed for getting snmp 
interface statistics. You can, of course, add other data collection 
methods, but out of the box, it is basically an interface traffic 
grapher. You still have to manually input each device that you will 
collect data for. Once you have specified the basic host information, it 
gives you a table showing all of the interfaces on that device and a 
checkbox for each item that can be graphed.

To be fair, though (and this applies to nagios as well as cacti) most of 
the effort you put in to setting up a monitoring solution is a one-time 
thing. It takes time to input all of the devices, but for the most part 
once the devices to be monitored are specified, that work is over. I 
think people incorrectly place a lot of emphasis on this or that 
product's autodiscovery function. Cacti's interface makes it really easy 
to maintain the configuration, and I think that is a bigger win than 
autodiscovery.

What do you mean by new device types?

Tom

-- 
-
Tom Ammon
Network Engineer
Mobile: 801.674.9273

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


--
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Status Map

2008-08-15 Thread Tom Ammon
I have used Nagvis extensively in our environment and it does a good job
of visualizing nagios status data.

I highly recommend it. To make it look really nice, however, you will
need to create your own icon sets, which can take some time. The
included icon sets work, but I found that they didn't meet our needs.

Tom

Doug Veldhuisen wrote:

 NagVis is supposed to be one options to do this, Not had enough time 
 to check it out myself.

 I am currently monitoring a couple of hundred devices and my standard 
 map looks terrible.

  

 Doug

  

  

 *From:* [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] *On Behalf Of 
 *Charles Breite
 *Sent:* Thursday, August 14, 2008 1:46 PM
 *To:* nagios-users@lists.sourceforge.net
 *Subject:* [Nagios-users] Nagios Status Map

  

  

 I have started adding icons to our Nagios status map first by adding 
 parents and then by adding hostext info. But the icons on the default 
 (circular markup)status map still overlap and are unreadable. I am 
 slowly adding user defined coordinates but would like the auto matic 
 circular markup map to look good also.

 Does anyone know of a way to make sure they cant auto overlap themselves?

 Thanks

  

  

 

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 Build the coolest Linux based applications with Moblin SDK  win great prizes
 Grand prize is a trip for two to an Open Source event anywhere in the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 

 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Freshness checking and a distributed Nagios system.

2008-08-15 Thread Tom Ammon
Nagios forces active checks to be run when used in conjuction with 
freshness checking, even when active checks for that service are 
disabled. The docs describe it pretty well at 
http://nagios.sourceforge.net/docs/2_0/distributed.html under the 
Freshness Checking section. You have to read it carefully, though.

Tom

Jonathan Call wrote:
 Correct me if I'm wrong:
 In order to run a distributed system, the central server should have
 active service checks disabled. But freshness checking executes the
 check command when it doesn't receive a passive response in a timely
 manner. This means the freshness check never runs.

 How do you get around that?


 This email message is intended for the use of the person to whom it has been 
 sent, and may contain information that is confidential or legally protected. 
 If you are not the intended recipient or have received this message in error, 
 you are not authorized to copy, distribute, or otherwise use this message or 
 its attachments. Please notify the sender immediately by return e-mail and 
 permanently delete this message and any attachments. Verio, Inc. makes no 
 warranty that this email is error or virus free.  Thank you.

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 Build the coolest Linux based applications with Moblin SDK  win great prizes
 Grand prize is a trip for two to an Open Source event anywhere in the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
   

-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] [Fwd: distributed monitoring host checking question]

2008-07-31 Thread Tom Ammon
So, no thoughts on this question?

 Original Message 
Subject:distributed monitoring host checking question
Date:   Wed, 30 Jul 2008 01:05:21 -0600
From:   Tom Ammon [EMAIL PROTECTED]
To: nagios-users@lists.sourceforge.net



Hi,

I am working on setting up a distributed monitoring system with Nagios 
(actually Groundwork). I have 3 child servers and 1 parent server, using 
NSCA to send passive check results from the children to the parent server.

My question is about how Nagios (version 2.5) will behave when an on 
demand host check needs to be run.

So for example:

Host A is configured with check_host_alive ( a simple ping ) as its host 
check command on the parent server. It is also configured with Service 
A, say an SNMP check. Active host checks are not disabled on the parent 
server, but active service checks are.

Host A, obviously, is also configured on the child server. When the 
child server sends a passive check result up to the parent saying that 
the SNMP check has failed, will the parent server then run the on-demand 
host check command to verify that Host A is still up? If not, how do I 
get that information up to the parent? Are passive host checks my only 
option?

So I guess the question is this: In a distributed monitoring setup, will 
a parent server run an on-demand host check for a host that gets a 
report (via a passive service check sent from a child server) of a 
service being critical?

Thanks,

Tom

-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu




-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] [Fwd: distributed monitoring host checking question]

2008-07-31 Thread Tom Ammon
I did, indeed, read the docs. However, the link you posted (or any of 
the other documentation I found) doesn't answer my question about the 
relationships between active host checks and passive service checks.

I post to lists because maybe the discussion can help someone else at a 
later time. Sure, I could try it and just figure it out on my own, but 
what about the community? Aren't we supposed to be helping each other here?

Sean, thanks for your help. If I see any different behavior with nagios 
2.5, I'll post it back to the list.

Tom

Marc Powell wrote:
   
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Tom Ammon
 Sent: Thursday, July 31, 2008 11:25 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] [Fwd: distributed monitoring host checking
 question]

 So, no thoughts on this question?
 

 Documentation- http://nagios.sourceforge.net/docs/2_0/passivechecks.html
 or take 10 minutes to just test it?

 --
 Marc

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 Build the coolest Linux based applications with Moblin SDK  win great prizes
 Grand prize is a trip for two to an Open Source event anywhere in the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
   

-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] distributed monitoring host checking question

2008-07-30 Thread Tom Ammon
Hi,

I am working on setting up a distributed monitoring system with Nagios 
(actually Groundwork). I have 3 child servers and 1 parent server, using 
NSCA to send passive check results from the children to the parent server.

My question is about how Nagios (version 2.5) will behave when an on 
demand host check needs to be run.

So for example:

Host A is configured with check_host_alive ( a simple ping ) as its host 
check command on the parent server. It is also configured with Service 
A, say an SNMP check. Active host checks are not disabled on the parent 
server, but active service checks are.

Host A, obviously, is also configured on the child server. When the 
child server sends a passive check result up to the parent saying that 
the SNMP check has failed, will the parent server then run the on-demand 
host check command to verify that Host A is still up? If not, how do I 
get that information up to the parent? Are passive host checks my only 
option?

So I guess the question is this: In a distributed monitoring setup, will 
a parent server run an on-demand host check for a host that gets a 
report (via a passive service check sent from a child server) of a 
service being critical?

Thanks,

Tom

-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null