Re: [DRBD-user] Not able to test Automatic split brain recovery policies

Digimer Thu, 11 Apr 2013 11:43:56 -0700

On 04/11/2013 08:27 AM, Dan Barker wrote:

-----Original Message-----
From: Shailesh Vaidya [mailto:shailesh_vai...@persistent.co.in]
Sent: Thursday, April 11, 2013 1:50 AM
To: Digimer
Cc: Dan Barker; drbd-user@lists.linbit.com
Subject: RE: [DRBD-user] Not able to test Automatic split brain recovery
policies


Hi Digimer,

Thanks for help and explanation. I will try it out fencing option.

However, I would like to validate if what I am testing for split-brain is
correct or not. Also what could be done for simple split-brain auto-
recovery through configuration without fencing.


There is no "simple split-brain" recovery. Split Brain only occurs after an 
error of some sort causing two different nodes to write to the same resource while 
disconnected. Anything other than manual recovery of files or blocks will lose data. In 
many cases, it's not even possible to determine what data is being lost or how to recover 
it. You just have to pick the lesser of two evils and move forward, honoring the writes 
to one node and discarding the writes done on the other. Most applications and file 
systems react poorly to having writes of theirs discarded.

Any effort spent automating the recovery of a split-brain could better be spent 
identifying how your configuration created the split brain, usually dual 
primary without sufficient controls in place to prevent split brain in the 
first place.

ymmv

Dan


To build on Dan's comments;

Automatic split-brain recovery where both nodes where StandAlone andPrimary is not possible. Consider this;


Say you want to recover by discarding the node with the least changes;

* Node 1 has an easily replaceable ISO written to it.
* Node 2 has accounting data written to it.

A human would know to discard Node 1, obviously, but "least changes"would cause node 2 to get overwritten.

Say you want to recover by discarding oldest changes; Repeat the aboveexample, but say that you record the accounting data an hour before theISO is written. No better.

The only safe way to recover from a split-brain is to bring up the nodeyou want to invalidate in StandAlone, mount the DRBD backed FS or VM,backup all the data to somewhere else, invalidate it, connect it to thestill-UpToDate node and let syncing begin and then manually merge thejust-backed up data into the now-resync'ing DRBD-backed data.

This is clumsy, prone to human errors and might well be very difficultor impossible, depending on the type of data stored on the DRBD resource.

*By far* the better option is to do everything you can to avoid asplit-brain in the first place.


To test that you have accomplished that;

Setup fencing and then repeat your tests where you break the networkconnection. You should then see one node get rebooted and the remainingnode continue. Once the fenced node powers back up, it should rejoin thegood node without complaining about a split-brain. So if the rebootednode automatically rejoins, you know your configuration is working properly.

Another good test is to crash each node using 'echo c >/proc/sysrq-trigger'. You should see that the healthy node reboots theother node. If you have used a delay against a node, you should be ableto see the difference in recovery time doing this test as well.


digimer

--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Not able to test Automatic split brain recovery policies

Reply via email to