however, and last I worked with Zabbix it seemed to default to not alerting > unless explicitly configured to do so. (It's been a while since we moved > away from it, so my memory is a bit foggy. Near as I can recall, a > configured alarm via a Zabbix agent check would not fire if the agent > itself was not reachable, and the system did not natively support the > concept of a "host down" alert in that situation, either. You had to > manually configure a check of the network interfaces and the agent itself, > which seemed very counter-intuitive, and let to many situations where we > hadn't properly thought through all failure scenarios to configure the > alarms explicitly enough. >
The trick here is to monitor with "nodata" - if you don't see ANY data from a node in say 5 or 10 minutes - something is hosed and you need to get alerted. This doesn't rely on the agent or its checks - your monitoring of nodata is done by the zabbix server itself. (look at https://www.zabbix.com/documentation/1.8/manual/config/triggers , the nodata() function...) Re pruning nodes out of zabbix: I've not done that actively - failures tend to need investigation and manual intervention - but if you're dealing with something elastic where nodes are spun down, I'd probably use Zabbix's API to remove the nodes or at least remove them from the active hostgroup. --e
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
