On Wed, Oct 12, 2016 at 04:35:58PM +0200, Jan Schermer wrote:
> Short in the dark - are the drives (or their controller if you're
> using raid) using any form of caching? It is conceivable that when
> resync is finished it tries flushing the data to the device, and if
> this takes waaaaay to long it could lead to timeout of the drbd kernel
> thread.
> Is IO happening on those drives when they are resyncing?
> Try running something like "sync ; sleep 1 ; sync" on the Inconsistent
> node when it's resyncing (I hope that won't kill your IO)

sync only affects stuff in the linux (buffer/) page cache,
DRBD sits below that.
"no effect" on DRBD IO.

> > Oct 12 06:56:11 ha14a kernel: block drbd1: Began resync as SyncTarget (will 
> > sync 0 KB [0 bits set]).
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: PingAck did not arrive in 
> > time.
> > Oct 12 06:56:12 ha14a kernel: d-con ha02_mysql: peer( Primary -> Unknown ) 
> > conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown )

has been said before:
DRBD ping timeout is apparently too short for the latency in your setup.
increase it appropriately.

Where latency in this case involves network rtt plus kernel thread
scheduling plus maybe additional synchronous (flush/fua) IO plus
whatever else DRBD feels is necessary for a full DRBD to DRBD round-trip.

> > However, I can guarantee that the network connection is solid.
> > Running ping flood, I get 30,000 packets sent with no loss or
> > latency.

Mind telling us the network characteristics?  IO backend?
Virtualized?  Distribution? Kernel and DRBD version(s)?

: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
please don't Cc me, but send to list -- I'm subscribed
drbd-user mailing list

Reply via email to