Ended up having a chat with Tom offline about monitoring systems and he asked me to post my conclusion - so not really looking for feedback, but this may stir up some discussion anyway. Happy to answer questions if anyone has any interest.
I am just about to deploy Zabbix internally. This came after a fair bit of debate internally and goes against previous experience. At my previous company we used Nagios with great success. However, we also had a fully automated solution setup such that a entry in a text file for a hostname would create all the DNS records and update the Nagios configuration for monitoring. At my new $JOB I am starting from scratch. There was nothing except PCs and a misconfigured domain controller when i arrived. As such there was no automation yet so Nagios configuration would have been a burden. Zabbix appeals to me because you can set up auto discovery leading to triggered actions. i.e. scan 10.10.0.0/24, IF you find a host AND it has a zabbix agent installed on it AND it answers with a uname containing something LIKE "SunOS" THEN - add it to the solaris servers group, and link it to all the production checks etc etc. So there is a large configuration burden up front - working out what i want to check and what inferences should be made about hosts based on their uname and network location. However, after that, anyone setting up a server in the right network without using the word "beta" in the hostname will get sucked into the monitoring system with no further work. It will be impossible to forget to monitor a server. In addition, zabbix will deal with agent or agentless checks (i.e. is port 25 open, and how quickly does it respond to the regular polling service). It can (with a little magic) receive snmp traps. It has some very clever triggers so that you can fire based on trending. I.e. fire if load average has been 2.5 for > 60 seconds - dont just fire out alerts on spikes. It does performance graphing all built in (no need for external rrd handling). If you dont mind installing agents, there are binaries for a variety of platforms so that unix folk can remain clueless about windows if they wish, just install the agent and check the standard checks. It handles escalation and repeat notification. i.e. keep bugging me with a text message every 5 minutes for 30 minutes. If there is no ack wake up the boss. So that's the good. On the downside, it has a 320 page configuration manual translated (semi-successfully) from Latvian. The forums and development are active, but unless you get lucky there are no responses to any questions - and i have plenty. Commercial support is available, but if i had budget to spend on network monitoring then I would probably have pitched for something commercial in the first place. There are no good example configuration guides out there that I have found beyond fairly trivial basic stuff. Hopefully this is helpful, and may start a debate. Anyone hate Zabbix with a flaming passion in favour of anything else? Other than Nagios, the only other serious consideration was OpenNMS. However, I found this to be a little too snmp focused, and it has an intangible feel of just being unfriendly. Hard to explain, but I just didnt like the dashboard and found it less familiar/obvious to navigate than Nagios or Zabbix. Thoughts? Rob _______________________________________________ Discuss mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
