On Thursday 10 April 2008 12:48:27 Lars Ellenberg wrote: > On Wed, Apr 09, 2008 at 06:34:39PM +0200, Lars Marowsky-Bree wrote: > > On 2008-04-08T19:32:58, Bernd Schubert <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > > > I need to set a rather huge dead time of 1200s, but the initial dead > > > time is supposed to be of 120s or less. However, heartbeat tries to be > > > schoolmasterly and doesn't want to accept my settings: > > > > > > deadtime 1200 # time to declare a node dead > > > initdead 120 # time to declare a node dead on heartbeat startup > > > keepalive 120 # how often to send keepalive packets > > > > Algorithmic reasons require that initdead be larger than deadtime. > > > > keepalive every two minutes and deadtime at 20 minutes is exceptional. > > > > Not even Lustre should create a load so high that a realtime priority > > thread which is entirely locked into memory is not reliably scheduled > > for 20 minutes at a stretch! > > Bernd, are you sure that heartbeat is not scheduled, > or is it possible that the heartbeat UDP packets just fall on the floor > because of memory pressure and network congestion, and maybe even > only heartbeating on the client data network?
I can exclude network congestion, since Lustre goes over Infiniband, while heartbeats goes over two independent IP connections, one of these is a direct server-to-server connection. > > what I would find out first: is heartbeat not scheduled, > or do the heartbeats get lost (as you know, udp is unreliable). It is rather probable heartbeat is just not scheduled, since even simple shell commands then hang. I already analyzed the kernel trances when Lustre and Linux-md are at high load - almost everything is in wait_for_completion(), schedule_timeout() and get_active_stripe() then. > > what does "netstat -su" say? In the mean time I had to reboot all systems to set these 20min deadtimes and now I first have to fill the filesystem again (which is not so easy any more, since a part of system goes into production and filling 400TB from 4 clients is not so easy ;) ). The high load does happen reproducible during for unlink() of all files (lustre does some kind of asynchronous unlink operation, so a single client will cause the high load on 20 Luster server systems). > what settings do you have for net.core.rmem_max and net.core.wmem_max? > consider to up that to 8 to 10 MB. I will increase these once I am convinced I need it. Thanks for your help, Bernd -- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
