I'm getting the following in my logs:
[49294.905016] drbd1: PingAck did not arrive in time.
[49294.905027] drbd1: peer( Secondary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown )
[49294.905041] drbd1: asender terminated
[49294.905044] drbd1: Terminating asender thread
[49294.905073] drbd1: Creating new current UUID
[49294.905119] drbd1: short read expecting header on sock: r=-512
[49294.937101] drbd1: Connection closed
[49294.937109] drbd1: conn( NetworkFailure -> Unconnected )
[49294.937115] drbd1: receiver terminated
[49294.937118] drbd1: Restarting receiver thread
[49294.937121] drbd1: receiver (re)started
[49294.937126] drbd1: conn( Unconnected -> WFConnection )
[49301.225043] drbd1: Handshake successful: Agreed network protocol
version 89
[49301.225054] drbd1: conn( WFConnection -> WFReportParams )
[49301.282351] drbd1: Starting asender thread (from drbd1_receiver [7260])
[49301.282444] drbd1: data-integrity-alg: <not-used>
[49301.427792] drbd1: drbd_sync_handshake:
[49301.427799] drbd1: self
E0C0EE5EFF7CFA03:AAC217D2C4C35171:0855658CB5E4342B:283A152D88823265
bits:1754 flags:0
[49301.427805] drbd1: peer
AAC217D2C4C35170:0000000000000000:0855658CB5E4342A:283A152D88823265
bits:0 flags:4
[49301.427809] drbd1: uuid_compare()=1 by rule 7
[49301.488733] drbd1: peer( Unknown -> Secondary ) conn( WFReportParams
-> WFBitMapS ) pdsk( DUnknown -> UpToDate )
[49542.834691] drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate ->
Inconsistent )
[49542.834714] drbd1: Began resync as SyncSource (will sync 7016 KB
[1754 bits set]).
[49548.580310] drbd1: Resync done (total 5 sec; paused 0 sec; 1400 K/sec)
[49548.580323] drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent
-> UpToDate )
The network connection is a bonded gigabit direct cable. The two nics
on each are different hardware. It's as redundant as I can make it.
I've stress-tested it and in 1 TB of data transfer the connection did
not incur a single error.
I have no idea why drbd is saying that the connection is failing (if
that's what the above log is saying.) This happens every 5 or 10
minutes, rendering the drbd filesystem nearly unusable.
Is there a setting or something I can use to test this? Or some way to
relax the timing for the PingAck?
selene:/data10# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by r...@selene,
2009-04-12 06:33:22
1: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/UpToDate C r---d
ns:1638484 nr:0 dw:1383680 dr:2202377 al:4564 bm:627 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:16
selene:/data10# cat /etc/drbd.conf
global {
usage-count yes;
}
common {
protocol C;
}
resource r0 {
syncer {
rate 80M;
}
on selene {
device /dev/drbd1;
disk /dev/md10;
address 10.254.254.6:7789;
meta-disk internal;
}
on eos.seiner.lan {
device /dev/drbd1;
disk /dev/md12;
address 10.254.254.2:7789;
meta-disk internal;
}
}
--
Yan Seiner
Support my bid for the 4J School Board.
Visit http://www.seiner.com/schoolboard
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems