Re: [lopsa-discuss] Recommendations for a Network Service Monitor?

Brad Knowles Mon, 09 Mar 2009 21:41:14 -0700

on 3/9/09 9:02 PM, Brent Chapman said:

> How does it compare to MRTG, Cricket, Cacti, and the various other
> commonly used monitoring systems that have been mentioned in this
> thread?  Easier or harder to configure and maintain?  More or less
> efficient in its SNMP queries?  More or less of a load on the
> monitoring host?  And so forth.


We only use munin to do trending on each individual host.  And for that 
part, it seems to work great, and gives us direct access to all the 
performance and trending data we could want on a per-machine basis.  But 
that does mean installing it on each machine, and we don't have a 
corresponding event/system monitoring application.

We don't have a central munin instance monitoring multiple hosts, like 
we might otherwise do with rrdtool and cricket or cacti, etc....  I'm 
sure it's capable of doing that, but we don't use it that way.


At UT Austin, we have previously used Nagios as a service/event 
monitoring system, and we haven't taken that as far as we should have. 
We definitely haven't made anywhere near as much use of the plugins, 
etc....  The Nagios system we have today monitors a few hundred 
machines, and I don't think it would scale much beyond that, but that's 
with using almost exclusively active host and service checks, very 
little SNMP queries, and no passive checks.  I think it's also running 
on an ancient SunFire V240 running Solaris 9, as opposed to a much more 
modern and powerful machine.

We're in the process of setting up a new network monitoring system, and 
we're going with Zenoss instead of going with something like OpsView or 
other tool that is built on top of Nagios.  I would have been happy with 
improved use of Nagios, or even something like OpsView, but the guy who 
is now in charge of the network monitoring system has prior experience 
with these tools and this was his choice.

However, we haven't actually set up Zenoss yet, so we don't really know 
for sure how well it will work in our environment.  I trust the guy 
who's leading this project, because I know what his background is. 
However, that doesn't directly translate into my ability to tell you how 
well it currently works.


At a recent LOPSA-Austin meeting, we had a presentation from Matt Ray 
(the Zenoss Community Manager, and a local Austinite) about what Zenoss 
can do and what is planned for the future, and it certainly seems like 
an amazingly capable SNMP-based monitoring tool.  However, the current 
version of the tool is very limited if you don't have some sort of SNMP 
agent on the device being monitored.  They're working on making it 
capable of automatically installing it's own host level agents (i.e., 
all you have to do is give it an ssh key to log in and then it can 
upload whatever tools it needs to run), but that's not yet available.

Moreover, there currently isn't any inference engine or dependency 
system, so if a router goes down you'll not only get warnings about the 
router being down, but also all the gear behind it.  My understanding is 
that the current version also can't distribute the monitoring load 
across multiple machines and then feed that data up into a consolidated 
central system.

I'm sure that Zenoss will fix all these problems, because it does seem 
to be evolving very rapidly and has a very active consumer/contributor 
base.  And maybe the Enterprise version will be more than we need for 
the near future, and we'll be able to grow into the other features we 
will need.  We certainly don't do any inference engines/dependencies 
with Nagios today (although it is capable of doing that), and we don't 
make use of distributed monitoring and collecting (although Nagios can 
also do that).


However, in the meanwhile, one of my co-workers is setting up cacti as 
an intermediate trending tool for monitoring some of the NetApp filers, 
and this is progressing quickly because he's done this a couple of times 
before -- including the bizarre crap that he has to do because NetApp's 
MIB uses pairs of 32-bit counters as pseudo 64-bit counters, so you've 
got to work up a way to do your own rollover math.

> Here's why I'm asking...  Real Soon Now (hopefully later this month),
> I'm going to be making the initial release of the automated network
> config generation tool that I've been working on (see
> http://www.netomata.com/products/ncg).

Unfortunately, all WAN networking on campus is managed by a central 
networking group (see 
<http://www.utexas.edu/its/about/org/networking.php>), but all LAN 
networking is managed by the respective facilities/operations groups in 
the respective College or Department that owns the space in question.

For the central ITS Department, all our LAN networking is handled by the 
Operations group (see 
<http://www.utexas.edu/its/about/org/operations.php>), which is in a 
different division than the Networking group (although both fall under 
the rubric of ITS Operations, see <http://www.utexas.edu/its/about/org/>).

The central networking group also handles all the DNS servers and 
services across campus, although each local facilities/operations group 
will have their own Technical Service Contacts who can make official 
submissions to the central networking group on their behalf.


Edge and core routers are owned and operated by Networking, while 
switches and internal leaf routers are owned and operated by the 
respective facilities/operations groups.

So, we really don't have a single group who could look at your tool and 
test it out.  The central networking group would come the closest, since 
they do provide advice to all the local facilities/operations groups 
(including our own University Data Center personnel), but they don't do 
the day-to-day management of those systems.

And both of those groups are separate from ITS-Systems, who manages most 
of the centralized servers and services for the campus.  I work in the 
Unix group, which is one of three subgroups within ITS-Systems.  We are 
looking at configuration management tools, but we don't need any network 
configuration features, we need something that can help us do host 
configuration maintenance.  Towards that end, we looked at cfengine, 
puppet, and bcfg2, and have settled on the last of these three.

If your tool could help use solve the same kinds of problems as these 
three, then maybe we should be taking a look at it.

>                                         I'm working on an example of
> the tool's use to share, where it generates all the config files for a
> small one-rack web hosting operation (routers, switches, firewalls,
> load balancers, Xen servers, Xen guests, DNS domains, etc.).  I want
> to extend the example to include generating config files for a
> monitoring system (Munin, MRTG, Cricket, Cacti, or something similar),
> and I'm trying to decide which system to use for my example.  Ideally,
> to maximize the useful life of the example, that would be the
> monitoring system that most folks _wish_ they were using (not
> necessarily the system that they _are_ using).

My guess is that if you cover Nagios and Zenoss, that will be the 
majority of the OSS community, at least as far as network/system 
monitoring tools.

> So, I welcome attempts to convince my why I should choose one package
> or another to be the first one I implement support for in the
> example... ;-)  Eventually, somebody (maybe me, maybe somebody else)
> will probably implement support for the various other monitoring
> packages as well, and show how easy it is with our tool to swap out
> one for another, but I have to choose one to start with.

What do you know best?  That's where I would be inclined to start.

-- 
Brad Knowles
<[email protected]>        If you like Jazz/R&B guitar, check out
LinkedIn Profile:                 my friend bigsbytracks on YouTube at
<http://tinyurl.com/y8kpxu>    http://preview.tinyurl.com/bigsbytracks
_______________________________________________
Discuss mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Recommendations for a Network Service Monitor?

Reply via email to