[Linux-HA] drbd network connection failure

Yan Seiner Mon, 11 May 2009 07:23:00 -0700

I'm getting the following in my logs:

[49294.905016] drbd1: PingAck did not arrive in time.
[49294.905027] drbd1: peer( Secondary -> Unknown ) conn( Connected -> 
NetworkFailure ) pdsk( UpToDate -> DUnknown )
[49294.905041] drbd1: asender terminated
[49294.905044] drbd1: Terminating asender thread
[49294.905073] drbd1: Creating new current UUID
[49294.905119] drbd1: short read expecting header on sock: r=-512
[49294.937101] drbd1: Connection closed
[49294.937109] drbd1: conn( NetworkFailure -> Unconnected )
[49294.937115] drbd1: receiver terminated
[49294.937118] drbd1: Restarting receiver thread
[49294.937121] drbd1: receiver (re)started
[49294.937126] drbd1: conn( Unconnected -> WFConnection )
[49301.225043] drbd1: Handshake successful: Agreed network protocol 
version 89
[49301.225054] drbd1: conn( WFConnection -> WFReportParams )
[49301.282351] drbd1: Starting asender thread (from drbd1_receiver [7260])
[49301.282444] drbd1: data-integrity-alg: <not-used>
[49301.427792] drbd1: drbd_sync_handshake:
[49301.427799] drbd1: self 
E0C0EE5EFF7CFA03:AAC217D2C4C35171:0855658CB5E4342B:283A152D88823265 
bits:1754 flags:0
[49301.427805] drbd1: peer 
AAC217D2C4C35170:0000000000000000:0855658CB5E4342A:283A152D88823265 
bits:0 flags:4
[49301.427809] drbd1: uuid_compare()=1 by rule 7
[49301.488733] drbd1: peer( Unknown -> Secondary ) conn( WFReportParams 
-> WFBitMapS ) pdsk( DUnknown -> UpToDate )
[49542.834691] drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> 
Inconsistent )
[49542.834714] drbd1: Began resync as SyncSource (will sync 7016 KB 
[1754 bits set]).
[49548.580310] drbd1: Resync done (total 5 sec; paused 0 sec; 1400 K/sec)
[49548.580323] drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent 
-> UpToDate )


The network connection is a bonded gigabit direct cable.  The two nics 
on each are different hardware.  It's as redundant as I can make it.  
I've stress-tested it and in 1 TB of data transfer the connection did 
not incur a single error.

I have no idea why drbd is saying that the connection is failing (if 
that's what the above log is saying.)  This happens every 5 or 10 
minutes, rendering the drbd filesystem nearly unusable.

Is there a setting or something I can use to test this?  Or some way to 
relax the timing for the PingAck?

selene:/data10# cat /proc/drbd
version: 8.3.1 (api:88/proto:86-89)
GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by r...@selene, 
2009-04-12 06:33:22

 1: cs:WFBitMapS ro:Primary/Secondary ds:UpToDate/UpToDate C r---d
    ns:1638484 nr:0 dw:1383680 dr:2202377 al:4564 bm:627 lo:0 pe:0 ua:0 
ap:0 ep:1 wo:b oos:16

selene:/data10# cat /etc/drbd.conf
global {
  usage-count yes;
}
common {
  protocol C;
}
resource r0 {
  syncer {
    rate 80M;
  }

  on selene {
    device    /dev/drbd1;
    disk      /dev/md10;
    address   10.254.254.6:7789;
    meta-disk internal;
  }
  on eos.seiner.lan {
    device    /dev/drbd1;
    disk      /dev/md12;
    address   10.254.254.2:7789;
    meta-disk internal;
  }
}


-- 
Yan Seiner 

Support my bid for the 4J School Board.
Visit http://www.seiner.com/schoolboard


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] drbd network connection failure

Reply via email to