Re: [Linux-HA] HB + DRBD + high I/O load = failed failover (sometimes)

Steve Wray Sun, 08 Jun 2008 20:35:40 -0700

Rodrigo Borges Pereira wrote:

Hello,


I have a two node cluster that occasionally has a weird behavior. The
cluster runs a number of Xen VM's with virtual disk files on top of a DRBD
device. Every night backups are done of each of the VM's, via rsync/ssh.
Sometimes, the load this generates causes hb to try to failover. Then for
some reason it fails to do so, and stays on the primary node. So all the
VM's shutdown and then boot again, on the same node.



Ok.. for what its worth...

I've been doing a fair bit of work with DRBD under Xen, ie the domU isrunning drbd.

I found that under high I/O load the DRBD subsystem would get errorssuch as "Pingack did not arrive in time". Sometimes the nodes would losecontact with one another and not automatically re-establish their link.

I tried about a bazillion different things to try to fix the problem,from low-level network configuration, various drbd configurationoptions, timeouts etc. Nothing worked.


There was one single thing which worked.

In the domU config you can set a rate limit on the virtual networkinterface.


Setting this to 20MB/s fixed the problem. Yeah 20M*B*/s not 'b'.

The config looks like this:

vif = [ 'rate=20MB/s, bridge=xenbr0' ]

Since I introduced this and rolled back all of my other optimisationsand tweaking everything is *fine* with drbd.

I'm pretty sure this has to do with timeout definitions, but what would be
the best locations to tune that?

yeah I thought my problem was a timeout issue... I spent a lot of timegradually increasing the drbd timeout values to insane levels with no luck.

TIA,
Rodrigo

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HB + DRBD + high I/O load = failed failover (sometimes)

Reply via email to