> I'm getting the following in my logs: > > [49294.905016] drbd1: PingAck did not arrive in time. > [49294.905027] drbd1: peer( Secondary -> Unknown ) conn( Connected -> > NetworkFailure ) pdsk( UpToDate -> DUnknown ) > [49294.905041] drbd1: asender terminated > [49294.905044] drbd1: Terminating asender thread > [49294.905073] drbd1: Creating new current UUID > [49294.905119] drbd1: short read expecting header on sock: r=-512 > [49294.937101] drbd1: Connection closed > [49294.937109] drbd1: conn( NetworkFailure -> Unconnected ) > [49294.937115] drbd1: receiver terminated > [49294.937118] drbd1: Restarting receiver thread > [49294.937121] drbd1: receiver (re)started > [49294.937126] drbd1: conn( Unconnected -> WFConnection ) > [49301.225043] drbd1: Handshake successful: Agreed network protocol > version 89 > [49301.225054] drbd1: conn( WFConnection -> WFReportParams ) > [49301.282351] drbd1: Starting asender thread (from drbd1_receiver [7260]) > [49301.282444] drbd1: data-integrity-alg: <not-used> > [49301.427792] drbd1: drbd_sync_handshake: > [49301.427799] drbd1: self > E0C0EE5EFF7CFA03:AAC217D2C4C35171:0855658CB5E4342B:283A152D88823265 > bits:1754 flags:0 > [49301.427805] drbd1: peer > AAC217D2C4C35170:0000000000000000:0855658CB5E4342A:283A152D88823265 > bits:0 flags:4 > [49301.427809] drbd1: uuid_compare()=1 by rule 7 > [49301.488733] drbd1: peer( Unknown -> Secondary ) conn( WFReportParams > -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) > [49542.834691] drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> > Inconsistent ) > [49542.834714] drbd1: Began resync as SyncSource (will sync 7016 KB > [1754 bits set]). > [49548.580310] drbd1: Resync done (total 5 sec; paused 0 sec; 1400 K/sec) > [49548.580323] drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent > -> UpToDate ) > > The network connection is a bonded gigabit direct cable. The two nics > on each are different hardware. It's as redundant as I can make it. > I've stress-tested it and in 1 TB of data transfer the connection did > not incur a single error. > > I have no idea why drbd is saying that the connection is failing (if > that's what the above log is saying.) This happens every 5 or 10 > minutes, rendering the drbd filesystem nearly unusable. > > Is there a setting or something I can use to test this? Or some way to > relax the timing for the PingAck? >
If you ping the other node does it respond right away? If you have some kind of lookup issue a ping may appear to hang even though the response is very fast. Darren _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
