Hi, I have found a strange behavior. Earlier I was working on DRBD version 8.3.8 and all remains good for "Digest integrity check FAILED" but with recent upgrade on 8.4 started to see delay in handshake process for around 7-8 sec which is causing temporarily split-brain type situation. I am very new to DRBD so please help me find this issue.Though I am planning to avoid using data-integrity-alg: md5 to avoid this error. But still need to understand what has changed in DRBD that caused the issue. I also tried updating the drbd.conf : fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; AS : fence-peer "/usr/lib/heartbeat/drbd-peer-outdater -t *1*"; Still no success. Any help or pointer is appreciated.
Following is the log sequence: On version 8.4.11-1 Dec 25 14:29:58 t1 kernel: block drbd0: Digest integrity check FAILED: 212623024s +4096 Dec 25 14:29:58 t1 kernel: drbd r0: error receiving Data, e: -5 l: 4112! Dec 25 14:29:58 t1 kernel: drbd r0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Dec 25 14:30:06 t1 kernel: drbd r0: Handshake successful: Agreed network protocol version 101 Dec 25 14:30:06 t1 kernel: drbd r0: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES. Dec 25 14:30:06 t1 kernel: drbd r0: Peer authenticated using 20 bytes HMAC Dec 25 14:30:06 t1 kernel: drbd r0: conn( WFConnection -> WFReportParams ) Dec 25 14:30:06 t1 kernel: drbd r0: Starting ack_recv thread (from drbd_r_r0 [46248]) Dec 25 14:30:06 t1 kernel: block drbd0: drbd_sync_handshake: Dec 25 14:30:06 t1 kernel: block drbd0: self 0238C8E28ABB978C:0000000000000000:9B86A2A3B57BABE6:9B85A2A3B57BABE7 bits:0 flags:0 Dec 25 14:30:06 t1 kernel: block drbd0: peer 134DF8CBE55C5001:0238C8E28ABB978D:9B86A2A3B57BABE7:9B85A2A3B57BABE7 bits:526 flags:0 Dec 25 14:30:06 t1 kernel: block drbd0: uuid_compare()=-1 by rule 50 Dec 25 14:30:06 t1 kernel: block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Dec 25 14:30:06 t1 kernel: block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 1215(1), total 1215; compression: 100.0% Dec 25 14:30:06 t1 kernel: block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 1215(1), total 1215; compression: 100.0% Dec 25 14:30:06 t1 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) Dec 25 14:30:06 t1 kernel: block drbd0: updated sync uuid 0239C8E28ABB978C:0000000000000000:9B86A2A3B57BABE6:9B85A2A3B57BABE7 Dec 25 14:30:06 t1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Dec 25 14:30:06 t1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Dec 25 14:30:06 t1 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Dec 25 14:30:06 t1 kernel: block drbd0: Began resync as SyncTarget (will sync 2104 KB [526 bits set]). On Version 8.3.8.1 Dec 2 06:23:18 t1 kernel: block drbd0: Digest integrity check FAILED. Dec 2 06:23:18 t1 kernel: block drbd0: error receiving Data, l: 4136! Dec 2 06:23:18 t1 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) Dec 2 06:23:18 t1 kernel: block drbd0: asender terminated Dec 2 06:23:18 t1 kernel: block drbd0: Terminating asender thread Dec 2 06:23:18 t1 kernel: block drbd0: Connection closed Dec 2 06:23:18 t1 kernel: block drbd0: conn( ProtocolError -> Unconnected ) Dec 2 06:23:18 t1 kernel: block drbd0: receiver terminated Dec 2 06:23:18 t1 kernel: block drbd0: Restarting receiver thread Dec 2 06:23:18 t1 kernel: block drbd0: receiver (re)started Dec 2 06:23:18 t1 kernel: block drbd0: conn( Unconnected -> WFConnection ) Dec 2 06:23:19 t1 kernel: block drbd0: Handshake successful: Agreed network protocol version 94 Dec 2 06:23:19 t1 kernel: block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Dec 2 06:23:19 t1 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Dec 2 06:23:19 t1 kernel: block drbd0: Starting asender thread (from drbd0_receiver [1909]) Dec 2 06:23:19 t1 kernel: block drbd0: data-integrity-alg: md5 Dec 2 06:23:19 t1 kernel: block drbd0: drbd_sync_handshake: Dec 2 06:23:19 t1 kernel: block drbd0: self 296E21EFF323E278:0000000000000000:404AAC56D3649880:9D5477E5D21C6D23 bits:0 flags:0 Dec 2 06:23:19 t1 kernel: block drbd0: peer 2DFCAA9A02C8F155:296E21EFF323E279:404AAC56D3649880:9D5477E5D21C6D23 bits:129 flags:0 Dec 2 06:23:19 t1 kernel: block drbd0: uuid_compare()=-1 by rule 50 Dec 2 06:23:19 t1 kernel: block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Dec 2 06:23:19 t1 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) Dec 2 06:23:19 t1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Dec 2 06:23:19 t1 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Dec 2 06:23:19 t1 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Dec 2 06:23:19 t1 kernel: block drbd0: Began resync as SyncTarget (will sync 516 KB [129 bits set]). Dec 2 06:23:19 t1 kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 516 K/sec) Dec 2 06:23:19 t1 kernel: block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Dec 2 06:23:19 t1 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Dec 2 06:23:19 t1 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) Thanks Vivek
_______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list [email protected] https://lists.linbit.com/mailman/listinfo/drbd-user
