Re: [Linux-HA] Q: DRBD+OCFS: Fencing

Tim Serong Thu, 22 Dec 2011 20:39:36 -0800

On 12/22/2011 09:19 PM, Ulrich Windl wrote:
> Hello!
>
> Heading the DRBD Guide for DRBD with OCFS (with pacemaker), it suggests that 
> fencing needs to be done whenever there is a problem with one of the nodes 
> running DRBD.
>
> I really wonder why: Why shoot the node if one out of several resources has a 
> problem? Why not try a disconnect/reconnect first? It should be faster anyway.
>
> Also if you are using different networks for cluster, access, and 
> replication, why assume that the cluster communication is dead if one DRBD 
> resource has a problem? While it may sound increadibly cool for the 
> developers to reset any node in the cluster, this is the most annoying thing 
> in practice, especially as you have little chances for debugging the problems.
>
> Would someone explain the rationale behind?


Kind of depends on exactly what's broken, but, in general, if *any* 
filesystem/storage resource fails and cannot be cleanly stopped on some 
node, the only safe thing to do is kill the entire node.  If you don't 
do this, you can't safely restart the resource on another node 
(non-clustered filesystem), and/or can't continue writing to a clustered 
filesystem like OCFS2 on some surviving node without risking data 
corruption.

Slightly more specifically regarding dual primary DRBD, from 
http://www.linbit.com/en/education/tech-guides/dual-primary-think-twice/

"When having a Dual-Primary resource, we principally have to assume that 
as soon as the two nodes sharing that DRBD drive get disconnected from 
each other, uncoordinated write attempts can happen on either of them. 
Measures need to be taken to make sure that when node is in trouble, 
that node can not cause corruption of a set of data anymore - welcome to 
fencing."

Regards,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
[email protected]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Q: DRBD+OCFS: Fencing

Reply via email to