Very interesting! We're currently using 3.2.3 with NagiosGraph on our legacy CentOS system, what version has these kind of changes in it? I want to make sure I'm running the right version when I set up the Chef test config.
Also, if NRPE is no longer in use, what has replaced it? I've heard of Icinga before, but not Shinken, I'll have to take a look at those as well. I know one of the problems with the historical Nagios GUI is that you can't do things like bulk-acknowledge alerts, you have to go through one by one and ack them individually. I believe Icinga allows you to do that. I know that I had read that Nagios was moving towards a PHP-based front-end that would allow them to more quickly iterate on improvements to the front end, but I wasn't sure if that had actually taken place or not. -- ~*~ StormeRider ~*~ "Every world needs its heroes [...] They inspire us to be better than we are. And they protect from the darkness that's just around the corner." (from Smallville Season 6x1: "Zod") On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS On Sun, Mar 24, 2013 at 11:30 AM, Florian Heigl <[email protected]>wrote: > Hi Morgan, > > 2013/3/24 Morgan Blackthorne <[email protected]> > >> This is a spin-off question related to the other monitoring system thread >> we have going, taking it from the general direction towards a specific >> use-case scenario. >> >> I've used several systems throughout the years, the last two notably >> being Nagios and Zabbix. Nagios seems better suited for monitoring, while >> Zabbix is clearly superior in terms of graphing. Configuring NagiosGraph >> is... more difficult than it should be, IMO. The Zabbix agent seemed to be >> less reliable than NRPE, however, and last I worked with Zabbix it seemed >> to default to not alerting unless explicitly configured to do so. (It's >> been a while since we moved away from it, so my memory is a bit foggy. Near >> as I can recall, a configured alarm via a Zabbix agent check would not fire >> if the agent itself was not reachable, and the system did not natively >> support the concept of a "host down" alert in that situation, either. You >> had to manually configure a check of the network interfaces and the agent >> itself, which seemed very counter-intuitive, and let to many situations >> where we hadn't properly thought through all failure scenarios to configure >> the alarms explicitly enough. All that said, I know some of the issues we >> had and raised with Zabbix were marked as pending the 2.x branch, which is >> out now-- I'm not sure if they've been resolved or if the framework to >> resolve them is now in place.) >> >> However, I'm specifically curious to see what people are using for >> environments where the hosts can be spun up and down outside the control of >> the normal provisioning channels. I know that there's been significant work >> done lately by the Opscode folks to configure Nagios dynamically via Chef, >> which is something I've got on my to-research list when I get beyond the >> ops programming tasks on my plate right now. I believe the downside of that >> would be whatever the interval is between a node being terminated and the >> configuration being regenerated. I know that Zabbix also supports the idea >> of dynamic node registration, which seems very applicable in this case, but >> again, I'm not sure if it's got some kind of pruning capability in place. >> >> I'm also curious to know along these lines if anyone has worked with a >> system (either native or with a connector) that will take advantage of >> Amazon's CloudWatch metrics. I could certainly monitor things like CPU and >> network utilization myself, but if AWS is already doing so, polling their >> data seems like it would be easier. (Potentially cleaner? I'm undecided on >> that, since it seems like it could introduce another dependency-- yet I've >> never seen CloudWatch unavailable when the core EC2 services were working. >> However, I may not have explored it in enough detail to see that kind of >> failure, so... I remain undecided.) One of the upsides of integrating with >> CloudWatch is that I can monitor the same metrics that autoscaling is >> operating on, and I believe actually retrieve those thresholds as well, >> rather than needing to configure them by hand (or by role in Chef, but that >> would still need to be manually updated if I changed the autoscaling >> parameters). >> > > I haven't used the new AWS internal stuff; > just please take note that the nagios area has undergone major changes in > the years since 2009. Stuff like Nagiosgraph is heavily dated. NRPE is > almost completely obsolete in new setups. > Configurations would be rule-based with inheritance of rules etc. > Graphs would like this: > Does the check return performance data? > => ok lets paint a "PNP4Nagios" graph for it. > No template for defining colours? > => well lets just use the default template > And the gui display just autodetects if there's a valid graph and if yes > it'll display it. > > Stuff can be easy. > > My old employer was Mathias Kettner GmbH who was responsible for a large > amount of those changes also kicked off a project called OMD with has a > nice installer bundle to stop wasting time on installing Nagios itself. > That package also comes with alternate cores like "Icinga" or "Shinken" > who some (some) consider much improved over Nagios. > > All this stuff can't be claimed to be aimed at cloud monitoring. Just, > please, should you check out Nagios again, don't settle for doing stuff > like it was done some years ago. > > Greets, > Florian > >
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
