On 01/29/2011 02:43 AM, Lewis Shobbrook wrote: > That's correct no sync has taken place & it is still un-synced. > > ... > > The resource nodes are still disconnected and no override has been used to > force the situation. > The only commands issued have been drbdadm connect all, drbdadm connect x2, > drbdadm primary x2 (on the only node that has ever been primary) and drbd > attach. > I'm the only one with access to these machines, I can assure you sync has not > been forced at any time. > > The only log record against this resource in all archived messages prior to > the system restart is.. > Jan 11 11:30:31 emlsurit-v4 kernel: [7745016.672246] block drbd9: disk( > UpToDate -> Diskless ) > I expect this is the point at which the drbdadm detach was issued, while the > node was primary and active.
Holy shit. Now this is a useful piece of information. You made your Primary diskless 12 days before you aleged DRBD problem. This of course leads to all writes being done on the Secondary (you're in a degraded state). This is all fine except after reboot, you made your main node (the one that's been diskless for a couple of days) Primary again before the handshake took place. Hence split-brain. > From the command history I can't determine which node the detach was issued > from. The one that went diskless. Detach is a local operation and doesn't affect the peer. > Does it matter which node a drbdadm detach is issued from? Yes, it's essential. > On node A it details system start Jan 23 15:07:16, > The resource was later set primary before network connection between the > nodes... > Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( > Secondary -> Primary ) > Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.122546] block drbd9: Creating new > current UUID > A minute later we can see the KVM instance start up and libvirt access the > resource... As stated above, going primary whilst unconnected is potentially harmful. > Perhaps what has been confusing the matter is my initial post associating > split-brain with the data loss. > The node was primary and active prior to any split brain, and it seems to me > that the roll back/loss of data had occurred prior to split-brain. > The only conceivable possibility to me, is still that NodeA has rolled back > or discarded changes in it's activity log following the restart. It has not. During split-brain, no data is synced until you allow for it. > As far as I can determine this occurred prior to the split-brain, while the > resource nodes where still disconnected (prior to restoration of network > connectivity). Outright impossible. > Just to be thorough, I'll export the KVM instance XML and start it up to > investigate the other node, but do not expect to find the data that's missing > there. You should. In any case, I hope you haven't made the guest on the main node operative in the meantime. Because you will really want to declare that node the split-brain victim. HTH, Felix _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
