Hello,

I am new to drbd, and I am just getting this set up. I am having a few problems. I am running 8.3 now on CentOS 5.3 64bit version. All latest patches applied.

I am getting errors, see way below.. I am just at the initial sync step.

nas1 is configured with:
6 x 1TB & 2 x 250 drives, 8gb ram, adaptec 5805 raid card RAID 5 on the 6 drives

nas2 is configured with
4 x 1.5TB & 2 x 250 drives, 8gb ram, adaptec 5805 raid card RAID 5 on the 4 drives

/data partition is /dev/sdb1 which is a total of 4,200,000 MB (about 4TB)
/meta partition is /dev/sdb2

configuration: /etc/drbd.conf

uname -a:
Linux nas2.mydomainhere.com 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

config:

common {
        protocol C;
        syncer {
                rate          60M;
                al-extents    257;
                }
        }
resource r0 {
handlers {
        pri-on-incon-degr       "halt -f";
        }
disk {
        on-io-error             detach;
        }
startup {
        degr-wfc-timeout        120;
        }
on nas1.mydomainhere.com {
        device          /dev/drbd0;
        disk                    /dev/sdb1;
        address         XXX.XXX.137.40:7789;
        meta-disk       /dev/sdb2[0];
}
on nas2.mydomainhere.com {
        device          /dev/drbd0;
        disk                    /dev/sdb1;
        address         XXX.XXX.137.41:7789;
        meta-disk       /dev/sdb2[0];
}
}


When first starting up, it starts syncing for about 10-15 minutes... (you can see it's down to 3,513,200 left to sync)...

# cat /proc/drbd

version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by [email protected] , 2009-08-29 14:07:55 0: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r---- ns:30511104 nr:0 dw:0 dr:30511104 al:0 bm:1861 lo:0 pe:94 ua:0 ap: 0 ep:1 wo:b oos:3567008704
        [>....................] sync'ed:  0.9% (3483404/3513200)M
        finish: 13:08:27 speed: 75,376 (58,332) K/sec


Then, in /var/log/messages, these errors start appearing:

Oct 13 10:06:17 nas1 avahi-daemon[3793]: Invalid response packet.
Oct 13 10:06:17 nas1 last message repeated 9 times

As soon as that starts, then we get all kinds of errors like this (sorry for the long post, trying to be complete)...

Oct 13 10:01:18 nas2 kernel: block drbd0: peer 136EDF2D710BB952:E0256B5135655E7D:22CB9163CBBE953B:027F952A21588331 bits:899379200 flags:0
Oct 13 10:01:18 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:01:18 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:01:19 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:01:21 nas2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) Oct 13 10:01:21 nas2 kernel: block drbd0: helper command: /sbin/ drbdadm before-resync-target minor-0 Oct 13 10:01:21 nas2 kernel: block drbd0: helper command: /sbin/ drbdadm before-resync-target minor-0 exit code 0 (0x0) Oct 13 10:01:21 nas2 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) Oct 13 10:01:21 nas2 kernel: block drbd0: Began resync as SyncTarget (will sync 3597516800 KB [899379200 bits set]).

-- start of problems here....
Oct 13 10:06:17 nas2 avahi-daemon[3971]: Invalid response packet.
Oct 13 10:06:17 nas2 last message repeated 9 times
Oct 13 10:10:41 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:10:41 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:10:41 nas2 kernel: block drbd0: asender terminated
Oct 13 10:10:41 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:10:41 nas2 kernel: block drbd0: short read receiving data: read 3720 expected 4096 Oct 13 10:10:41 nas2 kernel: block drbd0: error receiving RSDataReply, l: 32792!
Oct 13 10:10:41 nas2 kernel: block drbd0: Connection closed
Oct 13 10:10:41 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:10:41 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:10:41 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:10:41 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:10:41 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection )
Oct 13 10:11:37 nas2 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 13 10:11:37 nas2 kernel: r8169: eth0: link up
Oct 13 10:11:39 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:11:39 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:11:39 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326])
Oct 13 10:11:39 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:11:39 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:11:39 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:11:39 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0
Oct 13 10:11:39 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:11:39 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:11:39 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:11:49 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:11:49 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:11:49 nas2 kernel: block drbd0: asender terminated
Oct 13 10:11:49 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:11:49 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088!
Oct 13 10:11:49 nas2 kernel: block drbd0: Connection closed
Oct 13 10:11:49 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:11:49 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:11:49 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:11:49 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:11:49 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:12:18 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:12:18 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:12:18 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326])
Oct 13 10:12:18 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:12:18 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:12:18 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:12:18 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0
Oct 13 10:12:18 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:12:18 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:12:19 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:12:29 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:12:29 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:12:29 nas2 kernel: block drbd0: asender terminated
Oct 13 10:12:29 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:12:29 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088!
Oct 13 10:12:29 nas2 kernel: block drbd0: Connection closed
Oct 13 10:12:29 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:12:29 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:12:29 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:12:29 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:12:29 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:12:54 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:12:54 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:12:54 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326])
Oct 13 10:12:54 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:12:54 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:12:54 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:12:54 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0
Oct 13 10:12:54 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:12:54 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:12:54 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:13:05 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:13:05 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:13:05 nas2 kernel: block drbd0: asender terminated
Oct 13 10:13:05 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:13:05 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088!
Oct 13 10:13:05 nas2 kernel: block drbd0: Connection closed
Oct 13 10:13:05 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:13:05 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:13:05 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:13:05 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:13:05 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection )
Oct 13 10:19:07 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:19:07 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:19:07 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:19:07 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:19:37 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:19:37 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:19:37 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326])
Oct 13 10:19:37 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:19:37 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:19:37 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:19:37 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0
Oct 13 10:19:37 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:19:37 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:19:37 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:19:47 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:19:47 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:19:47 nas2 kernel: block drbd0: asender terminated
Oct 13 10:19:47 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:19:47 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088!
Oct 13 10:19:47 nas2 kernel: block drbd0: Connection closed
Oct 13 10:19:47 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:19:47 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:19:47 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:19:47 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:19:47 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:20:17 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:20:17 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:20:17 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326])
Oct 13 10:20:17 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:20:17 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:20:17 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:20:17 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0
Oct 13 10:20:17 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:20:17 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:20:17 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:20:27 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:20:27 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:20:27 nas2 kernel: block drbd0: asender terminated
Oct 13 10:20:27 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:20:27 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088!
Oct 13 10:20:27 nas2 kernel: block drbd0: Connection closed
Oct 13 10:20:27 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:20:27 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:20:27 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:20:27 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:20:27 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Oct 13 10:20:48 nas2 kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Oct 13 10:20:48 nas2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Oct 13 10:20:48 nas2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4326])
Oct 13 10:20:48 nas2 kernel: block drbd0: data-integrity-alg: <not-used>
Oct 13 10:20:48 nas2 kernel: block drbd0: drbd_sync_handshake:
Oct 13 10:20:48 nas2 kernel: block drbd0: self 286F730971D4FFB0:0000000000000000:0000000000000000:0000000000000000 bits:891369192 flags:0 Oct 13 10:20:48 nas2 kernel: block drbd0: peer 136EDF2D710BB952:286F730971D4FFB1:E0256B5135655E7D:22CB9163CBBE953B bits:891369192 flags:0
Oct 13 10:20:48 nas2 kernel: block drbd0: uuid_compare()=-1 by rule 5
Oct 13 10:20:48 nas2 kernel: block drbd0: Becoming sync target due to disk states. Oct 13 10:20:48 nas2 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Oct 13 10:20:59 nas2 kernel: block drbd0: PingAck did not arrive in time. Oct 13 10:20:59 nas2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Oct 13 10:20:59 nas2 kernel: block drbd0: asender terminated
Oct 13 10:20:59 nas2 kernel: block drbd0: Terminating asender thread
Oct 13 10:20:59 nas2 kernel: block drbd0: error receiving ReportBitMap, l: 4088!
Oct 13 10:20:59 nas2 kernel: block drbd0: Connection closed
Oct 13 10:20:59 nas2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
Oct 13 10:20:59 nas2 kernel: block drbd0: receiver terminated
Oct 13 10:20:59 nas2 kernel: block drbd0: Restarting receiver thread
Oct 13 10:20:59 nas2 kernel: block drbd0: receiver (re)started
Oct 13 10:20:59 nas2 kernel: block drbd0: conn( Unconnected -> WFConnection )
Oct 13 10:21:17 nas2 avahi-daemon[3971]: Invalid response packet.
Oct 13 10:21:17 nas2 last message repeated 4 times

At this point, ssh is extremely sluggish on nas2.. to the point it takes 30-60 seconds to type anything...

and you'll see it's no longer syncing

[r...@nas1 ~]# cat /proc/drbd
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by [email protected] , 2009-08-29 14:07:55 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Inconsistent C r---- ns:32040144 nr:0 dw:0 dr:32048320 al:0 bm:1955 lo:0 pe:0 ua:0 ap: 0 ep:1 wo:b oos:3565476768

My only thought is that the switch between the 2 machines is bad, but why would that lockup the machine...?

thanks
Marc













_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to