Ended up having a chat with Tom offline about monitoring systems and
he asked me to post my conclusion - so not really looking for
feedback, but this may stir up some discussion anyway.  Happy to
answer questions if anyone has any interest.

I am just about to deploy Zabbix internally.  This came after a fair
bit of debate internally and goes against previous experience.  At my
previous company we used Nagios with great success.  However, we
also had a fully automated solution setup such that a entry in a text
file for a hostname would create all the DNS records and update the
Nagios configuration for monitoring.  At my new $JOB I am starting
from scratch.  There was nothing except PCs and a misconfigured domain
controller when i arrived.  As such there was no automation yet so
Nagios configuration would have been a burden.

Zabbix appeals to me because you can set up auto discovery leading to
triggered actions.  i.e. scan 10.10.0.0/24, IF you find a host AND it
has a zabbix agent installed on it AND it answers with a uname
containing something LIKE "SunOS" THEN - add it to the solaris servers
group, and link it to all the production checks etc etc.  So there is
a large configuration burden up front - working out what i want to
check and what inferences should be made about hosts based on their
uname and network location.  However, after that, anyone setting up a
server in the right network without using the word "beta" in the
hostname will get sucked into the monitoring system with no further
work.  It will be impossible to forget to monitor a server.

In addition, zabbix will deal with agent or agentless checks (i.e. is
port 25 open, and how quickly does it respond to the regular polling
service).  It can (with a little magic) receive snmp traps.  It has
some very clever triggers so that you can fire based on trending.
I.e. fire if load average has been 2.5 for > 60 seconds - dont just
fire out alerts on spikes.  It does performance graphing all built in
(no need for external rrd handling).  If you dont mind installing
agents, there are binaries for a variety of platforms so that unix
folk can remain clueless about windows if they wish, just install the
agent and check the standard checks.  It handles escalation and repeat
notification.  i.e. keep bugging me with a text message every 5
minutes for 30 minutes.  If there is no ack wake up the boss.

So that's the good.  On the downside, it has a 320 page configuration
manual translated (semi-successfully) from Latvian.  The forums and
development are active, but unless you get lucky there are no
responses to any questions - and i have plenty.  Commercial support is
available, but if i had budget to spend on network monitoring then I
would probably have pitched for something commercial in the first
place.  There are no good example configuration guides out there that
I have found beyond fairly trivial basic stuff.

Hopefully this is helpful, and may start a debate.  Anyone hate Zabbix
with a flaming passion in favour of anything else?  Other than Nagios,
the only other serious consideration was OpenNMS.  However, I found
this to be a little too snmp focused, and it has an intangible feel of
just being unfriendly.  Hard to explain, but I just didnt like the
dashboard and found it less familiar/obvious to navigate than Nagios
or Zabbix.

Thoughts?

Rob
_______________________________________________
Discuss mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to