On Sat, 16 Jun 2012, Dennis Birkholz wrote:

> I would like to monitor a setup that exists of two "backend" servers
> ServerA and ServerB which are hosting virtual machines VM1, VM2, ...
>
> We have two DRBDs disk1 and disk2 where disk1 is primary on ServerA
> (containing the data for VM1) and disk2 is primary on ServerB
> (containing the data for VM2).
>
> If one server fails, the DRBD is switch to the other server and the VMs
> are restarted there.

    This sounds very much like "classic" HA (e.g. "Veritas VCS") and
the best way I've found to montor such a setup is to monitor each
physical host independently for various things like performance
parameters, disk space, availability, and the like on the physical
hosts' management interfaces, and then monitor the applications on
the applications' IP address (which migrate from host to host as
needed).

    Most of the base monitoring I do uses SNMP calls (to NET-SNMP),
and to catch the switch of the shared disks from one physical
host to the other I've put logic in the switching code to restart
the SNMP agent thereby making the just-switched disk visible to
it.  From this, it is possible to infer what's running where -- and
to raise an alarm if that's appropriate -- even if the application
probes show nothing untoward (failovers usually happen so quickly
that the monitoring system misses them unless there are asynchronous
notifications (e.g. SNMP traps) in play to catch them.  Too, if one
loses a physical host (thereby putting both instances on a single
server), that can be detected by looking at the physical hosts' IP
addresses (since the monitoring application is HA, if the host it's
on fails it'll switch to the other and still be operational) and
you'll get an alert of a "host down" condition.

    The situation is a bit more grey when both physical hosts are up
but both HA instances are running on a single host (e.g. after a
failover and following the repair of the failed host); in this case,
one infers that both instances are on the same host and either raises
an alarm or configures an event-handler to switch the "failed"
instance back to its proper host.  (As above via NET-SNMP's dskTable
or HOST-RESOURCES OID tree.)

    Cheers!

+------------------------------------------------+---------------------+
| Carl Richard Friend (UNIX Sysadmin)            | West Boylston       |
| Minicomputer Collector / Enthusiast            | Massachusetts, USA  |
| mailto:crfri...@rcn.com                        +---------------------+
| http://users.rcn.com/crfriend/museum           | ICBM: 42:22N 71:47W |
+------------------------------------------------+---------------------+

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
icinga-users mailing list
icinga-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/icinga-users

Reply via email to