Re: [DRBD-user] Fencing & split brain related questions

Digimer Sun, 16 Mar 2014 10:03:27 -0700

On 16/03/14 11:14 AM, Lars Ellenberg wrote:

On Fri, Mar 14, 2014 at 10:44:54AM -0400, Digimer wrote:

On 14/03/14 05:34 AM, khaled atteya wrote:

A- In DRBD Users's guide , in explanation of "resource-only" which one
of fencing policy , they said:


"If a node becomes a disconnected primary, it tries to fence the
peer's disk. This is done by calling the fence-peer handler. The handler
is supposed to reach the
other node over alternative communication paths and call 'drbdadm
outdate minor' there."

My question is : if the handler can't reach the other node for any
reason ,what will happen ?


I always use 'resource-and-stonith', which blocks until the fence
action was a success. As for the fence handler, I always pass the
requests up to the cluster manager. To do this, I use 'rhcs_fence'
on Red Hat clusters (cman + rgmanager) or crm-fence-peer.sh on
corosync + pacemaker clusters.

In either case, the fence action does not try to log into the other
node. Instead, it uses an external device, like IPMI or PDUs, and
forces the node off.


crm-fence-peer.sh *DOES NOT* fence (as in node-level fence aka stonith) the 
other node.
It "fences" the other node by setting a pacemaker constraint
to prevent the other node from being, respectively becoming, Primary.

It tries hard to detect whether the replication link loss
was a node failure (in which case pacemaker will notice and stonith
anyways) or only some problem with the replication tcp connection,
in which case the other node is still reachable via cluster
communications and will notice and respect the new constraint.

If it seems to be a node level failure, it tries to wait until the cib
reflects it as "confirmed and expected down" (successful stonith).
There are various timeouts to modify the exact behaviour.

That script contains massive shell comments,
documenting its intended usage and functionality.

In a dual-primary setup, if it was a replication link failure only,
and cluster communication is still up, both will call that handler,
but only one will succeed to set the constraint.  The other will remain
IO-blocked, and can optionally "commit suicide" from inside the handler.


Ooooooh, I misunderstood this. Thanks for clarifying!

Just because you where able to shoot the other node does not make your
data any better.

In a situation, where you only use node level fencing from inside
this handler, the other node would boot back up, and if configured
to start cluster services by default, could start pacemaker, not see the
other node, do startup-fencing (shoot the still live Primary), and
conclude from being able to shoot the other node that its own, outdated,
version of the data would be good enough to go online.
Unlikely, but possible.

Which is why in this scenario, you should not start up cluster services
if you cannot see the peer, or at least refuse to shoot from the
DRBD fence-peer handler if your local disk state is only "Consistent"
(which it is after bringing up DRBD, if configured for such fencing,
if it cannot communicate with its peer).

So to be "good", you need both: the node level fencing,
and the drbd level fencing.

B- In active/passive mode , are these directives have effect:
Are these directives "after-sb-0pri , after-sb-1pri  , after-sb-2pri"
have effects in Active/passive mode or only in Active/Active mode ?
If they have effects , what if i don't set them , is their default value
for each ?


It doesn't matter what mode you are in, it matters what happened
during the time that the nodes were split-brained. If both nodes
were secondary during the split-brain, 0pri policy is used. If one
node was Primary and the other remained secondary, 1pri policy is
used. If both nodes were primary, even for a short time, 2pri is
used.



The 0, 1 and 2 Primary is counting Primaries
at the moment of the DRBD handshake.

(If one had been secondary all along,
we would not have data divergence,
only "fast-forwardable" outdated-ness.)

Which means that if you happen to have a funky multi-failure scenario,
and end up doing the drbd handshake where the one with the better data
is Secondary, the one with the not-so-good data is Primary, and you have
configured "discard Secondary", you will be very disappointed.

All such auto-recovery strategies (but zero-changes) are automating data loss.
So you should better be sure to mean that.

The reason the policy doesn't matter so much is because the roles
matter, not how they got there. For example, if you or someone else
assumed the old primary was dead and manually promoted the
secondary, you have a two-primary split-brain, despite the normal
mode of operation.

C- can I use SBD fencing with drbd+pacemaker rather than IPMI or PDU?


No, I do not believe so. The reason being that if the nodes
split-brain, both will think they have access to the "SAN" storage.
Where as with a real (external) SAN, it's possible to say "only one
node is allowed to talk and the other is blocked. There is no way
for one node to block access to the other node's local DRBD data.

IPMI/PDU fencing is certainly the way to go.


You cannot use the replication device as its own fencing mechanism.
That's a dependency loop.
You can still use DRBD, and SBD, but you would have a different,
actually shared IO medium for the SBD, independend from your DRBD setup.

Of course you can also put an SBD on an iSCSI export from a different
DRBD cluster, as long as that other iSCSI + DRBD cluster is properly
setup to avoid data divergence there under any circumstances...



--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Fencing & split brain related questions

Reply via email to