on 3/9/09 9:02 PM, Brent Chapman said: > How does it compare to MRTG, Cricket, Cacti, and the various other > commonly used monitoring systems that have been mentioned in this > thread? Easier or harder to configure and maintain? More or less > efficient in its SNMP queries? More or less of a load on the > monitoring host? And so forth.
We only use munin to do trending on each individual host. And for that part, it seems to work great, and gives us direct access to all the performance and trending data we could want on a per-machine basis. But that does mean installing it on each machine, and we don't have a corresponding event/system monitoring application. We don't have a central munin instance monitoring multiple hosts, like we might otherwise do with rrdtool and cricket or cacti, etc.... I'm sure it's capable of doing that, but we don't use it that way. At UT Austin, we have previously used Nagios as a service/event monitoring system, and we haven't taken that as far as we should have. We definitely haven't made anywhere near as much use of the plugins, etc.... The Nagios system we have today monitors a few hundred machines, and I don't think it would scale much beyond that, but that's with using almost exclusively active host and service checks, very little SNMP queries, and no passive checks. I think it's also running on an ancient SunFire V240 running Solaris 9, as opposed to a much more modern and powerful machine. We're in the process of setting up a new network monitoring system, and we're going with Zenoss instead of going with something like OpsView or other tool that is built on top of Nagios. I would have been happy with improved use of Nagios, or even something like OpsView, but the guy who is now in charge of the network monitoring system has prior experience with these tools and this was his choice. However, we haven't actually set up Zenoss yet, so we don't really know for sure how well it will work in our environment. I trust the guy who's leading this project, because I know what his background is. However, that doesn't directly translate into my ability to tell you how well it currently works. At a recent LOPSA-Austin meeting, we had a presentation from Matt Ray (the Zenoss Community Manager, and a local Austinite) about what Zenoss can do and what is planned for the future, and it certainly seems like an amazingly capable SNMP-based monitoring tool. However, the current version of the tool is very limited if you don't have some sort of SNMP agent on the device being monitored. They're working on making it capable of automatically installing it's own host level agents (i.e., all you have to do is give it an ssh key to log in and then it can upload whatever tools it needs to run), but that's not yet available. Moreover, there currently isn't any inference engine or dependency system, so if a router goes down you'll not only get warnings about the router being down, but also all the gear behind it. My understanding is that the current version also can't distribute the monitoring load across multiple machines and then feed that data up into a consolidated central system. I'm sure that Zenoss will fix all these problems, because it does seem to be evolving very rapidly and has a very active consumer/contributor base. And maybe the Enterprise version will be more than we need for the near future, and we'll be able to grow into the other features we will need. We certainly don't do any inference engines/dependencies with Nagios today (although it is capable of doing that), and we don't make use of distributed monitoring and collecting (although Nagios can also do that). However, in the meanwhile, one of my co-workers is setting up cacti as an intermediate trending tool for monitoring some of the NetApp filers, and this is progressing quickly because he's done this a couple of times before -- including the bizarre crap that he has to do because NetApp's MIB uses pairs of 32-bit counters as pseudo 64-bit counters, so you've got to work up a way to do your own rollover math. > Here's why I'm asking... Real Soon Now (hopefully later this month), > I'm going to be making the initial release of the automated network > config generation tool that I've been working on (see > http://www.netomata.com/products/ncg). Unfortunately, all WAN networking on campus is managed by a central networking group (see <http://www.utexas.edu/its/about/org/networking.php>), but all LAN networking is managed by the respective facilities/operations groups in the respective College or Department that owns the space in question. For the central ITS Department, all our LAN networking is handled by the Operations group (see <http://www.utexas.edu/its/about/org/operations.php>), which is in a different division than the Networking group (although both fall under the rubric of ITS Operations, see <http://www.utexas.edu/its/about/org/>). The central networking group also handles all the DNS servers and services across campus, although each local facilities/operations group will have their own Technical Service Contacts who can make official submissions to the central networking group on their behalf. Edge and core routers are owned and operated by Networking, while switches and internal leaf routers are owned and operated by the respective facilities/operations groups. So, we really don't have a single group who could look at your tool and test it out. The central networking group would come the closest, since they do provide advice to all the local facilities/operations groups (including our own University Data Center personnel), but they don't do the day-to-day management of those systems. And both of those groups are separate from ITS-Systems, who manages most of the centralized servers and services for the campus. I work in the Unix group, which is one of three subgroups within ITS-Systems. We are looking at configuration management tools, but we don't need any network configuration features, we need something that can help us do host configuration maintenance. Towards that end, we looked at cfengine, puppet, and bcfg2, and have settled on the last of these three. If your tool could help use solve the same kinds of problems as these three, then maybe we should be taking a look at it. > I'm working on an example of > the tool's use to share, where it generates all the config files for a > small one-rack web hosting operation (routers, switches, firewalls, > load balancers, Xen servers, Xen guests, DNS domains, etc.). I want > to extend the example to include generating config files for a > monitoring system (Munin, MRTG, Cricket, Cacti, or something similar), > and I'm trying to decide which system to use for my example. Ideally, > to maximize the useful life of the example, that would be the > monitoring system that most folks _wish_ they were using (not > necessarily the system that they _are_ using). My guess is that if you cover Nagios and Zenoss, that will be the majority of the OSS community, at least as far as network/system monitoring tools. > So, I welcome attempts to convince my why I should choose one package > or another to be the first one I implement support for in the > example... ;-) Eventually, somebody (maybe me, maybe somebody else) > will probably implement support for the various other monitoring > packages as well, and show how easy it is with our tool to swap out > one for another, but I have to choose one to start with. What do you know best? That's where I would be inclined to start. -- Brad Knowles <[email protected]> If you like Jazz/R&B guitar, check out LinkedIn Profile: my friend bigsbytracks on YouTube at <http://tinyurl.com/y8kpxu> http://preview.tinyurl.com/bigsbytracks _______________________________________________ Discuss mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
