> I'm getting the following in my logs:
> 
> [49294.905016] drbd1: PingAck did not arrive in time.
> [49294.905027] drbd1: peer( Secondary -> Unknown ) conn( Connected ->
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> [49294.905041] drbd1: asender terminated
> [49294.905044] drbd1: Terminating asender thread
> [49294.905073] drbd1: Creating new current UUID
> [49294.905119] drbd1: short read expecting header on sock: r=-512
> [49294.937101] drbd1: Connection closed
> [49294.937109] drbd1: conn( NetworkFailure -> Unconnected )
> [49294.937115] drbd1: receiver terminated
> [49294.937118] drbd1: Restarting receiver thread
> [49294.937121] drbd1: receiver (re)started
> [49294.937126] drbd1: conn( Unconnected -> WFConnection )
> [49301.225043] drbd1: Handshake successful: Agreed network protocol
> version 89
> [49301.225054] drbd1: conn( WFConnection -> WFReportParams )
> [49301.282351] drbd1: Starting asender thread (from drbd1_receiver
[7260])
> [49301.282444] drbd1: data-integrity-alg: <not-used>
> [49301.427792] drbd1: drbd_sync_handshake:
> [49301.427799] drbd1: self
> E0C0EE5EFF7CFA03:AAC217D2C4C35171:0855658CB5E4342B:283A152D88823265
> bits:1754 flags:0
> [49301.427805] drbd1: peer
> AAC217D2C4C35170:0000000000000000:0855658CB5E4342A:283A152D88823265
> bits:0 flags:4
> [49301.427809] drbd1: uuid_compare()=1 by rule 7
> [49301.488733] drbd1: peer( Unknown -> Secondary ) conn(
WFReportParams
> -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
> [49542.834691] drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate
->
> Inconsistent )
> [49542.834714] drbd1: Began resync as SyncSource (will sync 7016 KB
> [1754 bits set]).
> [49548.580310] drbd1: Resync done (total 5 sec; paused 0 sec; 1400
K/sec)
> [49548.580323] drbd1: conn( SyncSource -> Connected ) pdsk(
Inconsistent
> -> UpToDate )
> 
> The network connection is a bonded gigabit direct cable.  The two nics
> on each are different hardware.  It's as redundant as I can make it.
> I've stress-tested it and in 1 TB of data transfer the connection did
> not incur a single error.
> 
> I have no idea why drbd is saying that the connection is failing (if
> that's what the above log is saying.)  This happens every 5 or 10
> minutes, rendering the drbd filesystem nearly unusable.
> 
> Is there a setting or something I can use to test this?  Or some way
to
> relax the timing for the PingAck?
> 

If you ping the other node does it respond right away? If you have some
kind of lookup issue a ping may appear to hang even though the response
is very fast.

Darren
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to