Re: [Linux-HA] HB + DRBD + high I/O load = failed failover (sometimes)

Dejan Muhamedagic Wed, 04 Jun 2008 13:14:30 -0700

Hi,

On Wed, Jun 04, 2008 at 08:37:50PM +0100, Rodrigo Borges Pereira wrote:
> Hello,
> 
> I have a two node cluster that occasionally has a weird behavior. The
> cluster runs a number of Xen VM's with virtual disk files on top of a DRBD
> device. Every night backups are done of each of the VM's, via rsync/ssh.
> Sometimes, the load this generates causes hb to try to failover.


Why?

> Then for
> some reason it fails to do so,

Logs should say why it fails.

> and stays on the primary node. So all the
> VM's shutdown and then boot again, on the same node.
> 
> I'm pretty sure this has to do with timeout definitions, but what would be
> the best locations to tune that?

Your feeling may be right, but only logs could give us the whole
story. If you're seeing "late heartbeat" messages or "node dead"
or "node returning after partition" then you definitely need to
adjust timing (keepalive, warntime, and deadtime). Note that the
wording of warnings may be different. Otherwise, if the monitor
operations are timing out, you should adjust the timeouts in the
CIB.

Thanks,

Dejan

> 
> TIA,
> Rodrigo
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HB + DRBD + high I/O load = failed failover (sometimes)

Reply via email to