Re: [Linux-HA] Initial dead time is smaller than deadtime

Bernd Schubert Thu, 10 Apr 2008 10:31:06 -0700

On Thursday 10 April 2008 12:48:27 Lars Ellenberg wrote:
> On Wed, Apr 09, 2008 at 06:34:39PM +0200, Lars Marowsky-Bree wrote:
> > On 2008-04-08T19:32:58, Bernd Schubert <[EMAIL PROTECTED]> wrote:
> > > Hello,
> > >
> > > I need to set a rather huge dead time of 1200s, but the initial dead
> > > time is supposed to be of 120s or less. However, heartbeat tries to be
> > > schoolmasterly and doesn't want to accept my settings:
> > >
> > > deadtime 1200 # time to declare a node dead
> > > initdead 120  # time to declare a node dead on heartbeat startup
> > > keepalive 120 # how often to send keepalive packets
> >
> > Algorithmic reasons require that initdead be larger than deadtime.
> >
> > keepalive every two minutes and deadtime at 20 minutes is exceptional.
> >
> > Not even Lustre should create a load so high that a realtime priority
> > thread which is entirely locked into memory is not reliably scheduled
> > for 20 minutes at a stretch!
>
> Bernd, are you sure that heartbeat is not scheduled,
> or is it possible that the heartbeat UDP packets just fall on the floor
> because of memory pressure and network congestion, and maybe even
> only heartbeating on the client data network?


I can exclude network congestion, since Lustre goes over Infiniband, while 
heartbeats goes over two independent IP connections, one of these is a direct 
server-to-server connection.

>
> what I would find out first: is heartbeat not scheduled,
> or do the heartbeats get lost (as you know, udp is unreliable).

It is rather probable heartbeat is just not scheduled, since even simple shell 
commands then hang. I already analyzed the kernel trances when Lustre and 
Linux-md are at high load - almost everything is in wait_for_completion(), 
schedule_timeout() and get_active_stripe() then.

>
> what does "netstat -su" say?

In the mean time I had to reboot all systems to set these 20min deadtimes and 
now I first have to fill the filesystem again (which is not so easy any more, 
since a part of system goes into production and filling 400TB from 4 clients 
is not so easy ;) ). The high load does happen reproducible during for 
unlink() of all files (lustre does some kind of asynchronous unlink 
operation, so a single client will cause the high load on 20 Luster server 
systems).

> what settings do you have for net.core.rmem_max and net.core.wmem_max?
> consider to up that to 8 to 10 MB.

I will increase these once I am convinced I need it.


Thanks for your help,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Initial dead time is smaller than deadtime

Reply via email to