On Tue, Jan 25, 2011 at 10:36:03AM +1100, Lew wrote:
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905963] block drbd9: 
> drbd_sync_handshake:
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905967] block drbd9: self 
> 49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 
> bits:143432 flags:0
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905971] block drbd9: peer 
> 6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 
> bits:336381 flags:0

There. Both nodes have changes the other node did not see (yet).
That's where DRBD can detect that there previously has been data
divergence, usually caused by cluster split brain.

> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905975] block drbd9: 
> uuid_compare()=100 by rule 90
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.906273] block drbd9: helper 
> command: /sbin/drbdadm split-brain minor-9
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937925] block drbd9: conn( 
> WFReportParams -> NetworkFailure ) 
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937935] block drbd9: asender 
> terminated
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937938] block drbd9: Terminating 
> asender thread
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.950821] block drbd9: helper 
> command: /sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00)
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.950827] block drbd9: conn( 
> NetworkFailure -> Disconnecting ) 
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951122] block drbd9: Connection 
> closed
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951129] block drbd9: conn( 
> Disconnecting -> StandAlone ) 
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951149] block drbd9: receiver 
> terminated
> Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951151] block drbd9: Terminating 
> receiver thread

Which is detected. DRBD cannot decide which version of your data you'd rather 
keep,
so the default behaviour is to drop the network connection, and no longer talk 
to the peer.

But 15 minutes later, you decide to try again to connect them,

> Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487616] block drbd9: conn( 
> StandAlone -> Unconnected ) 
> Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487638] block drbd9: Starting 
> receiver thread (from drbd9_worker [2126])
> Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487690] block drbd9: receiver 
> (re)started
> Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487696] block drbd9: conn( 
> Unconnected -> WFConnection ) 
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182513] block drbd9: Handshake 
> successful: Agreed network protocol version 91
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182522] block drbd9: conn( 
> WFConnection -> WFReportParams ) 
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182539] block drbd9: Starting 
> asender thread (from drbd9_receiver [20045])
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183313] block drbd9: 
> data-integrity-alg: <not-used>
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183340] block drbd9: 
> drbd_sync_handshake:
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183345] block drbd9: self 
> 49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 
> bits:143799 flags:0
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183349] block drbd9: peer 
> 6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 
> bits:336381 flags:0


DRBD notices that you still have not decided which version to use,
and we can see that currently, emsulrit-v4 is still being actively
modified (we cannot be sure about the other node, though).

> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183353] block drbd9: 
> uuid_compare()=100 by rule 90
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183610] block drbd9: helper 
> command: /sbin/drbdadm split-brain minor-9
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192301] block drbd9: conn( 
> WFReportParams -> NetworkFailure ) 
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192309] block drbd9: asender 
> terminated
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192311] block drbd9: Terminating 
> asender thread
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192702] block drbd9: helper 
> command: /sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00)
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192709] block drbd9: conn( 
> NetworkFailure -> Disconnecting ) 
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193004] block drbd9: Connection 
> closed
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193012] block drbd9: conn( 
> Disconnecting -> StandAlone ) 

And again, the connection is dropped.

> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193027] block drbd9: receiver 
> terminated
> Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193029] block drbd9: Terminating 
> receiver thread
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356300] block drbd9: conn( 
> StandAlone -> Unconnected ) 
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356326] block drbd9: Starting 
> receiver thread (from drbd9_worker [2126])
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356519] block drbd9: receiver 
> (re)started
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356527] block drbd9: conn( 
> Unconnected -> WFConnection ) 


So... My guess is, that you still have two versions of your data.

>From this log, there was no sync, because DRBD default behaviour in that
case it to disconnect. Therefore no rollback, and no data loss.
But you certainly have diverging data sets, and my guess is they keep
diverging still.

You have to figure out when they started to diverge, and why.
And you have to sort it out, decide which to keep,
and tell DRBD (see the User's Guide for details on this).

Consider booking DRBD Training

        ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to