On Sat, 16 Jun 2012, Dennis Birkholz wrote: > I would like to monitor a setup that exists of two "backend" servers > ServerA and ServerB which are hosting virtual machines VM1, VM2, ... > > We have two DRBDs disk1 and disk2 where disk1 is primary on ServerA > (containing the data for VM1) and disk2 is primary on ServerB > (containing the data for VM2). > > If one server fails, the DRBD is switch to the other server and the VMs > are restarted there.
This sounds very much like "classic" HA (e.g. "Veritas VCS") and the best way I've found to montor such a setup is to monitor each physical host independently for various things like performance parameters, disk space, availability, and the like on the physical hosts' management interfaces, and then monitor the applications on the applications' IP address (which migrate from host to host as needed). Most of the base monitoring I do uses SNMP calls (to NET-SNMP), and to catch the switch of the shared disks from one physical host to the other I've put logic in the switching code to restart the SNMP agent thereby making the just-switched disk visible to it. From this, it is possible to infer what's running where -- and to raise an alarm if that's appropriate -- even if the application probes show nothing untoward (failovers usually happen so quickly that the monitoring system misses them unless there are asynchronous notifications (e.g. SNMP traps) in play to catch them. Too, if one loses a physical host (thereby putting both instances on a single server), that can be detected by looking at the physical hosts' IP addresses (since the monitoring application is HA, if the host it's on fails it'll switch to the other and still be operational) and you'll get an alert of a "host down" condition. The situation is a bit more grey when both physical hosts are up but both HA instances are running on a single host (e.g. after a failover and following the repair of the failed host); in this case, one infers that both instances are on the same host and either raises an alarm or configures an event-handler to switch the "failed" instance back to its proper host. (As above via NET-SNMP's dskTable or HOST-RESOURCES OID tree.) Cheers! +------------------------------------------------+---------------------+ | Carl Richard Friend (UNIX Sysadmin) | West Boylston | | Minicomputer Collector / Enthusiast | Massachusetts, USA | | mailto:crfri...@rcn.com +---------------------+ | http://users.rcn.com/crfriend/museum | ICBM: 42:22N 71:47W | +------------------------------------------------+---------------------+ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ icinga-users mailing list icinga-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/icinga-users