Thanks for the reply Felix,
Log extracts are included below & attached as requested. 
> Hi,
> 
> On 01/24/2011 01:20 AM, Lew wrote:
> > I've encountered some unexpected behavior with a split brain
> > instance.
> > It seems from what has occurred that the default behavior is set to
> > roll
> > back & discard changes.
> >
> > Recently in my sand pit, I've been manually disconnecting resources
> > as a
> > an ad hock way of maintaining a snapshot for roll back.
> > This way if I'm happy with changes, I can reconnect and within
> > seconds
> > we're fully synced again.
> >
> > I moved a server yesterday and discovered after a drbdadm connect
> > all,
> > that one of the resources had split brained and discarded a few days
> > worth of work;
> > rolling back to the point in time when the resource was first
> > disconnected.
> >
> > What's interesting to me is that the disconnected secondary node had
> > never been set primary, so how did we end up in split brain?
> > I also do not understand why it was only this resource that split
> > brained, when others that existed in seemingly identical
> > configurations
> > and states did not.
> >
> > I expect I'll need to explicitly prohibit this behavior in a global
> > net
> > section covering after-sb-0pri etc;
> > I still don't understand why discard & roll back has been chosen
> > default
> > behavior, I'm contending from my experience it should not be.
> 
> It's not.

OK, seems to me something smells then.

> > Looks to me like a few days work is lost, but if anyone knows of a
> > way
> > to recover from a roll back discard scenario, I'd be very happy to
> > find out.
> 
> Please share pertinent logs and drbd configuration.

Config
------
resource x2 {
        protocol A;

syncer {
        rate 100M;
        }
on emlsurit-v4 {
    device     /dev/drbd9;
    disk      /dev/r50lvm/emlsurit-x2-drbd;
    address   192.168.254.100:7799;
    flexible-meta-disk  internal;
}
on emlsurit-v5 {
    device    /dev/drbd9;
    disk      /dev/r50lvm/emlsurit-x2-drbd;
    address   192.168.254.101:7799;
    meta-disk internal;
        }
}

Global Config (comments removed)
-------------
global {
        usage-count yes;
        }

common {
        protocol A;

        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh";
                echo o > /proc/sysrq-trigger ; halt -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh"; 
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                        }

        startup {
                        }

        disk {
                no-disk-flushes;
                no-md-flushes;
                        
        }

        net {
             
        }
        syncer {
                
        }
}

Message Log extract (A bit long to post)-- see attached.

Cheers & thanks,

Lew

Jan 23 15:07:16 emlsurit-v4 kernel: [   15.028980] block drbd9: Starting worker 
thread (from cqueue [1924])
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.029345] block drbd9: disk( Diskless 
-> Attaching ) 
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034009] block drbd9: Found 4 
transactions (192 active extents) in activity log.
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034013] block drbd9: Method to 
ensure write ordering: barrier
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034017] block drbd9: Backing 
device's merge_bvec_fn() = ffffffff81435d30
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034020] block drbd9: 
max_segment_size ( = BIO size ) = 4096
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034023] block drbd9: drbd_bm_resize 
called with capacity == 31456248
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034144] block drbd9: resync bitmap: 
bits=3932031 words=61438
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.034146] block drbd9: size = 15 GB 
(15728124 KB)
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044907] block drbd9: recounting of 
set bits took additional 0 jiffies
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044910] block drbd9: 0 KB (0 bits) 
marked out-of-sync by on disk bit-map.
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044929] block drbd9: Marked 
additional 508 MB as out-of-sync based on AL.
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.049222] block drbd9: disk( Attaching 
-> UpToDate ) 
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.087220] block drbd9: conn( 
StandAlone -> Unconnected ) 
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.087237] block drbd9: Starting 
receiver thread (from drbd9_worker [2126])
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.087277] block drbd9: receiver 
(re)started
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.087281] block drbd9: conn( 
Unconnected -> WFConnection ) 
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.087332] block drbd9: conn( 
WFConnection -> Disconnecting ) 
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.287540] block drbd9: Discarding 
network configuration.
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.287690] block drbd9: Connection 
closed
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.287699] block drbd9: conn( 
Disconnecting -> StandAlone ) 
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.287823] block drbd9: receiver 
terminated
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.287828] block drbd9: Terminating 
receiver thread
Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( Secondary 
-> Primary ) 
Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.122546] block drbd9: Creating new 
current UUID
Jan 23 15:55:06 emlsurit-v4 kernel: [ 2880.172752] type=1503 
audit(1295758506.227:17):  operation="open" pid=8340 parent=1787 
profile="/usr/lib/libvirt/virt-aa-helper" requested_mask="r::" 
denied_mask="r::" fsuid=0 ouid=0 name="/dev/drbd9"
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.806263] block drbd9: conn( 
StandAlone -> Unconnected ) 
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.806312] block drbd9: Starting 
receiver thread (from drbd9_worker [2126])
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.806353] block drbd9: receiver 
(re)started
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.806359] block drbd9: conn( 
Unconnected -> WFConnection ) 
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.904763] block drbd9: Handshake 
successful: Agreed network protocol version 91
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.904772] block drbd9: conn( 
WFConnection -> WFReportParams ) 
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.904796] block drbd9: Starting 
asender thread (from drbd9_receiver [18060])
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.904936] block drbd9: 
data-integrity-alg: <not-used>
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905963] block drbd9: 
drbd_sync_handshake:
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905967] block drbd9: self 
49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:143432 
flags:0
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905971] block drbd9: peer 
6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:336381 
flags:0
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905975] block drbd9: 
uuid_compare()=100 by rule 90
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.906273] block drbd9: helper command: 
/sbin/drbdadm split-brain minor-9
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937925] block drbd9: conn( 
WFReportParams -> NetworkFailure ) 
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937935] block drbd9: asender 
terminated
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.937938] block drbd9: Terminating 
asender thread
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.950821] block drbd9: helper command: 
/sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00)
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.950827] block drbd9: conn( 
NetworkFailure -> Disconnecting ) 
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951122] block drbd9: Connection 
closed
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951129] block drbd9: conn( 
Disconnecting -> StandAlone ) 
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951149] block drbd9: receiver 
terminated
Jan 23 22:19:35 emlsurit-v4 kernel: [25910.951151] block drbd9: Terminating 
receiver thread
Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487616] block drbd9: conn( 
StandAlone -> Unconnected ) 
Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487638] block drbd9: Starting 
receiver thread (from drbd9_worker [2126])
Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487690] block drbd9: receiver 
(re)started
Jan 23 22:34:37 emlsurit-v4 kernel: [26811.487696] block drbd9: conn( 
Unconnected -> WFConnection ) 
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182513] block drbd9: Handshake 
successful: Agreed network protocol version 91
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182522] block drbd9: conn( 
WFConnection -> WFReportParams ) 
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.182539] block drbd9: Starting 
asender thread (from drbd9_receiver [20045])
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183313] block drbd9: 
data-integrity-alg: <not-used>
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183340] block drbd9: 
drbd_sync_handshake:
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183345] block drbd9: self 
49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:143799 
flags:0
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183349] block drbd9: peer 
6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807 bits:336381 
flags:0
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183353] block drbd9: 
uuid_compare()=100 by rule 90
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.183610] block drbd9: helper command: 
/sbin/drbdadm split-brain minor-9
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192301] block drbd9: conn( 
WFReportParams -> NetworkFailure ) 
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192309] block drbd9: asender 
terminated
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192311] block drbd9: Terminating 
asender thread
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192702] block drbd9: helper command: 
/sbin/drbdadm split-brain minor-9 exit code 127 (0x7f00)
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.192709] block drbd9: conn( 
NetworkFailure -> Disconnecting ) 
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193004] block drbd9: Connection 
closed
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193012] block drbd9: conn( 
Disconnecting -> StandAlone ) 
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193027] block drbd9: receiver 
terminated
Jan 23 22:35:04 emlsurit-v4 kernel: [26838.193029] block drbd9: Terminating 
receiver thread
Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356300] block drbd9: conn( 
StandAlone -> Unconnected ) 
Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356326] block drbd9: Starting 
receiver thread (from drbd9_worker [2126])
Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356519] block drbd9: receiver 
(re)started
Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356527] block drbd9: conn( 
Unconnected -> WFConnection ) 
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to