Re: [DRBD-user] Default Split Brain Behaviour

Felix Frank Mon, 31 Jan 2011 00:11:21 -0800

On 01/29/2011 02:43 AM, Lewis Shobbrook wrote:
> That's correct no sync has taken place & it is still un-synced.
> 
> ...
> 
> The resource nodes are still disconnected and no override has been used to 
> force the situation.
> The only commands issued have been drbdadm connect all, drbdadm connect x2, 
> drbdadm primary x2 (on the only node that has ever been primary) and drbd 
> attach.
> I'm the only one with access to these machines, I can assure you sync has not 
> been forced at any time.
> 
> The only log record against this resource in all  archived messages prior to 
> the system restart is..
> Jan 11 11:30:31 emlsurit-v4 kernel: [7745016.672246] block drbd9: disk( 
> UpToDate -> Diskless )
> I expect this is the point at which the drbdadm detach was issued, while the 
> node was primary and active.


Holy shit. Now this is a useful piece of information.

You made your Primary diskless 12 days before you aleged DRBD problem.
This of course leads to all writes being done on the Secondary (you're
in a degraded state).

This is all fine except after reboot, you made your main node (the one
that's been diskless for a couple of days) Primary again before the
handshake took place. Hence split-brain.

> From the command history I can't determine which node the detach was issued 
> from.

The one that went diskless. Detach is a local operation and doesn't
affect the peer.

> Does it matter which node a drbdadm detach is issued from?

Yes, it's essential.

> On node A it details system start Jan 23 15:07:16,  
> The resource was later set primary before network connection between the 
> nodes...
> Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( 
> Secondary -> Primary ) 
> Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.122546] block drbd9: Creating new 
> current UUID
> A minute later we can see the KVM instance start up and libvirt access the 
> resource...

As stated above, going primary whilst unconnected is potentially harmful.

> Perhaps what has been confusing the matter is my initial post associating 
> split-brain with the data loss.
> The node was primary and active prior to any split brain, and it seems to me 
> that the roll back/loss of data had occurred prior to split-brain. 
> The only conceivable possibility to me, is still that NodeA has rolled back 
> or discarded changes in it's activity log following the restart.

It has not. During split-brain, no data is synced until you allow for it.

> As far as I can determine this occurred prior to the split-brain, while the 
> resource nodes where still disconnected (prior to restoration of network 
> connectivity).

Outright impossible.

> Just to be thorough, I'll export the KVM instance XML and start it up to 
> investigate the other node, but do not expect to find the data that's missing 
> there.

You should. In any case, I hope you haven't made the guest on the main
node operative in the meantime. Because you will really want to declare
that node the split-brain victim.

HTH,
Felix
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Default Split Brain Behaviour

Reply via email to