however, and last I worked with Zabbix it seemed to default to not alerting
> unless explicitly configured to do so. (It's been a while since we moved
> away from it, so my memory is a bit foggy. Near as I can recall, a
> configured alarm via a Zabbix agent check would not fire if the agent
> itself was not reachable, and the system did not natively support the
> concept of a "host down" alert in that situation, either. You had to
> manually configure a check of the network interfaces and the agent itself,
> which seemed very counter-intuitive, and let to many situations where we
> hadn't properly thought through all failure scenarios to configure the
> alarms explicitly enough.
>

The trick here is to monitor with "nodata" - if you don't see ANY data from
a node in say 5 or 10 minutes - something is hosed and you need to get
alerted.

This doesn't rely on the agent or its checks - your monitoring of nodata is
done by the zabbix server itself.

(look at https://www.zabbix.com/documentation/1.8/manual/config/triggers ,
the nodata() function...)

Re pruning nodes out of zabbix:  I've not done that actively - failures
tend to need investigation and manual intervention - but if you're dealing
with something elastic where nodes are spun down, I'd probably use Zabbix's
API to remove the nodes or at least remove them from the active hostgroup.

--e
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to