On Wed, Mar 24, 2010 at 07:59:26PM +0000, Mario Giammarco wrote: > Andrew Beekhof <and...@...> writes: > > > > > Have you seen: > > http://www.clusterlabs.org/doc/crm_fencing.html > > I have been led to believe that STONITH > > > will help prevent split brain situations, but the LINBIT instructions do > > > not > > > provide any guidance on how to conifgure STONITH in the pacemaker cluster. > > Probably the 10 million dollar question is: does drbd really need stonith? > > I am interested too....
DRBD per se does not. Your data may or may not. The main difference between a replicated and a shared solution here is, if you do concurrent *uncoordinated* modifications * to a shared disk, you scramble your data. * to a DRBD, you get "diverging data sets". so with DRBD, if you really lose all cluster communications, and NOT STONITH, and ignore quorum loss etc., you can end up with both sides of the DRBD being Primary, being consistent in themselves, but diverging. Once you realise this, you get to the fun part of chosing which data set you want to keep, if you'd try to "manually merge" them (depending on type of data that may even be possible, but not on the DRBD level), or scratch both versions and restore from latest backup anyways. It may be a plausible assumption that no (relevant) modifications are done on an isolated system, though, unless you happen to get client communication going to both systems without re-establishing communication between those systems. Which is still entirely possible, of course. DRBD resource-level fencing can help in variations of the following scenario: all good. replication link breaks, other cluster comm channels still available. [A] Primary keeps going for a while Primary goes down [B] former Secondary takes over WITH STALE DATA. [A] at this point, the resource level fencing can use the other cluster communication channels (via the cib, of via dopd) to persistently record the "Outdated"ness of the then Secondary, so pacemaker would not even attempt to promote it at [B], respectively, it would refuse to be promoted (without applying brute --force). Variations of that scenario include Secondary crash instead of replication link loss, and some are more difficult to explain. Some may also require to set "fencing resource-and-stonith" in drbd.conf, even though no stonith is actually applied, just for the side-effects of that setting. resource-level fencing (alone) is NOT sufficient, if all (remaining) cluster communication is lost at (virtually) the same time as the replication link. So if you need to protect you against that scenario, you need to (also) configure STONITH. Stonith alone does not help, either. Above scenario again, but instead of "Primary goes down", think "remaining cluster comm breaks". shoot out, former Secondary wins. But just because you can shoot someone does not mean you have the bi^Wbetter data. Sooo. What do we do about those dollars now? ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker