Hello,

i'm new to DRBD and have some problems with splitbrains every week or so.

I have a standard Active/Passive setup as given in the drbd-documentation and this is my config file:

global {
    usage-count no;
}
common {
    syncer { rate 100M; }
    protocol      C;
}
resource mailstore {
    startup {
       wfc-timeout 0;
       degr-wfc-timeout
       120;
    }
    disk { on-io-error detach; no-disk-barrier; }
    on primary.axigen.cluster {
       device      /dev/drbd1;
       disk        /dev/axigen/mailstore;
       address     192.168.1.10:7791;
       meta-disk   internal;
    }
    on secondary.axigen.cluster {
       device      /dev/drbd1;
       disk        /dev/axigen/mailstore;
       address     192.168.1.20:7791;
       meta-disk   internal;
    }
}

After some days my cluster always changes his state as following:

[r...@primary ~]# drbd-overview
1:mailstore StandAlone Primary/Unknown UpToDate/DUnknown r---- /var/opt/axigen ext3 99G 298M 98G 1%

[r...@secondary ~]# drbd-overview
  1:mailstore  StandAlone Secondary/Unknown Outdated/DUnknown r----

I get the following messages in in dmesg on my primary node:

[r...@primary ~]# dmesg |grep block
drbd: registered as block device major 147
block drbd1: Starting worker thread (from cqueue/1 [195])
block drbd1: disk( Diskless -> Attaching )
block drbd1: Found 4 transactions (23 active extents) in activity log.
block drbd1: Method to ensure write ordering: flush
block drbd1: max_segment_size ( = BIO size ) = 32768
block drbd1: drbd_bm_resize called with capacity == 209708728
block drbd1: resync bitmap: bits=26213591 words=409588
block drbd1: size = 100 GB (104854364 KB)
block drbd1: recounting of set bits took additional 2 jiffies
block drbd1: 184 KB (46 bits) marked out-of-sync by on disk bit-map.
block drbd1: disk( Attaching -> UpToDate )
block drbd1: Barriers not supported on meta data device - disabling
block drbd1: conn( StandAlone -> Unconnected )
block drbd1: Starting receiver thread (from drbd1_worker [3504])
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3517])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self B0A76171352A5A3C:C23C1E60BAF36299:CDF18AFFD009B9F5:7F938649A9B876DD bits:46 flags:0 block drbd1: peer C23C1E60BAF36298:0000000000000000:CDF18AFFD009B9F4:7F938649A9B876DD bits:0 flags:0
block drbd1: uuid_compare()=1 by rule 70
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
block drbd1: Began resync as SyncSource (will sync 184 KB [46 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 184 K/sec)
block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
block drbd1: peer( Secondary -> Primary )
block drbd1: peer( Primary -> Secondary )
block drbd1: role( Secondary -> Primary )
block drbd1: sock was shut down by peer
block drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
block drbd1: short read expecting header on sock: r=0
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Creating new current UUID
block drbd1: Connection closed
block drbd1: conn( BrokenPipe -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3517])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self 5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0 block drbd1: peer B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0
block drbd1: uuid_compare()=1 by rule 70
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
block drbd1: Began resync as SyncSource (will sync 0 KB [0 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
block drbd1: sock was shut down by peer
block drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
block drbd1: short read expecting header on sock: r=0
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Creating new current UUID
block drbd1: Connection closed
block drbd1: conn( BrokenPipe -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: State change failed: Need access to UpToDate data
block drbd1: state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown r--- } block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Outdated/DUnknown r--- }
block drbd1: role( Primary -> Secondary )
block drbd1: role( Secondary -> Primary )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3517])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self F601A659931B404D:5A7A31FAC38B4C31:F52C25D8178F32E6:B0A76171352A5A3D bits:2 flags:0 block drbd1: peer CDEEDF66F24D3ECC:5A7A31FAC38B4C30:F52C25D8178F32E6:B0A76171352A5A3D bits:1132 flags:0
block drbd1: uuid_compare()=100 by rule 90
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0)
block drbd1: Split-Brain detected but unresolved, dropping connection!
block drbd1: helper command: /sbin/drbdadm split-brain minor-1
block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
block drbd1: conn( WFReportParams -> Disconnecting )
block drbd1: error receiving ReportState, l: 4!
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread
[r...@primary ~]#

And on my secondary:

[r...@secondary ~]# dmesg |grep block
drbd: registered as block device major 147
block drbd1: Starting worker thread (from cqueue/1 [195])
block drbd1: disk( Diskless -> Attaching )
block drbd1: Found 4 transactions (46 active extents) in activity log.
block drbd1: Method to ensure write ordering: flush
block drbd1: max_segment_size ( = BIO size ) = 32768
block drbd1: drbd_bm_resize called with capacity == 209708728
block drbd1: resync bitmap: bits=26213591 words=409588
block drbd1: size = 100 GB (104854364 KB)
block drbd1: recounting of set bits took additional 3 jiffies
block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
block drbd1: disk( Attaching -> UpToDate )
block drbd1: Barriers not supported on meta data device - disabling
block drbd1: conn( StandAlone -> Unconnected )
block drbd1: Starting receiver thread (from drbd1_worker [3527])
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3538])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self C23C1E60BAF36298:0000000000000000:CDF18AFFD009B9F4:7F938649A9B876DD bits:0 flags:0 block drbd1: peer B0A76171352A5A3C:C23C1E60BAF36299:CDF18AFFD009B9F5:7F938649A9B876DD bits:46 flags:0
block drbd1: uuid_compare()=-1 by rule 50
block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
block drbd1: conn( WFBitMapT -> WFSyncUUID )
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
block drbd1: Began resync as SyncTarget (will sync 184 KB [46 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 184 K/sec)
block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)
block drbd1: role( Secondary -> Primary )
block drbd1: role( Primary -> Secondary )
block drbd1: peer( Secondary -> Primary )
block drbd1: PingAck did not arrive in time.
block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: short read expecting header on sock: r=-512
block drbd1: Connection closed
block drbd1: conn( NetworkFailure -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3538])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self B0A76171352A5A3C:0000000000000000:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0 block drbd1: peer 5A7A31FAC38B4C31:B0A76171352A5A3D:28B6F22789E39014:C23C1E60BAF36299 bits:0 flags:0
block drbd1: uuid_compare()=-1 by rule 50
block drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
block drbd1: conn( WFBitMapT -> WFSyncUUID )
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm before-resync-target minor-1 exit code 0 (0x0) block drbd1: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
block drbd1: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
block drbd1: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1
block drbd1: helper command: /sbin/drbdadm after-resync-target minor-1 exit code 0 (0x0)
block drbd1: Connected in w_make_resync_request
block drbd1: PingAck did not arrive in time.
block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: short read expecting header on sock: r=-512
block drbd1: Connection closed
block drbd1: conn( NetworkFailure -> Unconnected )
block drbd1: receiver terminated
block drbd1: Restarting receiver thread
block drbd1: receiver (re)started
block drbd1: role( Secondary -> Primary )
block drbd1: Creating new current UUID
block drbd1: conn( Unconnected -> WFConnection )
block drbd1: role( Primary -> Secondary )
block drbd1: disk( UpToDate -> Outdated )
block drbd1: Handshake successful: Agreed network protocol version 94
block drbd1: conn( WFConnection -> WFReportParams )
block drbd1: Starting asender thread (from drbd1_receiver [3538])
block drbd1: data-integrity-alg: <not-used>
block drbd1: drbd_sync_handshake:
block drbd1: self CDEEDF66F24D3ECC:5A7A31FAC38B4C30:F52C25D8178F32E6:B0A76171352A5A3D bits:1132 flags:0 block drbd1: peer F601A659931B404D:5A7A31FAC38B4C31:F52C25D8178F32E6:B0A76171352A5A3D bits:2 flags:0
block drbd1: uuid_compare()=100 by rule 90
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
block drbd1: meta connection shut down by peer.
block drbd1: conn( WFReportParams -> NetworkFailure )
block drbd1: asender terminated
block drbd1: Terminating asender thread
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0)
block drbd1: Split-Brain detected but unresolved, dropping connection!
block drbd1: helper command: /sbin/drbdadm split-brain minor-1
block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
block drbd1: conn( NetworkFailure -> Disconnecting )
block drbd1: error receiving ReportState, l: 4!
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread
[r...@secondary ~]#



Could anyone please explain me what is happening here?

Thanks in advance

Anton







_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to