On 5 November 2010 20:32, mike <[email protected]> wrote: > Hi all, > > I'm running a simple MySQL cluster on a very heavily loaded LPAR and > experiencing some outages due to late heartbeat packets, Gmain timeouts > and so on.
Before we look at the settings, do you know if keepalives are lost due to load on the network(NIC and/or switch) or due to a load on the system? > I'd like to adjust these settings: > > # Thresholds (in seconds) > keepalive 1 > warntime 6 > deadtime 10 > initdead 15 > > I'm thnking I'd like to make it this: > > # Thresholds (in seconds) > keepalive 60 > warntime 60 > deadtime 120 > initdead 240 > > Anyone see a problem with these settings? Let's see with the above settings, how long it will take for the cluster to detect a node failure. [this is how understand these settings, so there is possibility I am wrong] 10:00:00 node1 receives a keepalive from node2 10:00:01 node2 is down 10:01:00 node1 issues the 1st warning 10:02:00 node1 detects that node2 is down Is it acceptable in your environment that long node failure detection? Cheers, Pavlos _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
