On Wed, Apr 09, 2008 at 06:34:39PM +0200, Lars Marowsky-Bree wrote:
> On 2008-04-08T19:32:58, Bernd Schubert <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> > 
> > I need to set a rather huge dead time of 1200s, but the initial dead time 
> > is 
> > supposed to be of 120s or less. However, heartbeat tries to be 
> > schoolmasterly and doesn't want to accept my settings:
> > 
> > deadtime 1200 # time to declare a node dead
> > initdead 120  # time to declare a node dead on heartbeat startup
> > keepalive 120 # how often to send keepalive packets
> 
> Algorithmic reasons require that initdead be larger than deadtime.
> 
> keepalive every two minutes and deadtime at 20 minutes is exceptional.
> 
> Not even Lustre should create a load so high that a realtime priority
> thread which is entirely locked into memory is not reliably scheduled
> for 20 minutes at a stretch!

Bernd, are you sure that heartbeat is not scheduled,
or is it possible that the heartbeat UDP packets just fall on the floor
because of memory pressure and network congestion, and maybe even
only heartbeating on the client data network?

what I would find out first: is heartbeat not scheduled,
or do the heartbeats get lost (as you know, udp is unreliable).

what does "netstat -su" say?
what settings do you have for net.core.rmem_max and net.core.wmem_max?
consider to up that to 8 to 10 MB.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to