On 5 November 2010 20:32, mike <[email protected]> wrote:
> Hi all,
>
> I'm running a simple MySQL cluster on a very heavily loaded LPAR and
> experiencing some outages due to late heartbeat packets, Gmain timeouts
> and so on.

Before we look at the settings, do you know if keepalives are lost due
to load on the network(NIC and/or switch) or due to a load on the
system?

> I'd like to adjust these settings:
>
> # Thresholds (in seconds)
>  keepalive                      1
>  warntime                       6
>  deadtime                       10
>  initdead                       15
>
> I'm thnking I'd like to make it this:
>
> # Thresholds (in seconds)
>  keepalive                      60
>  warntime                       60
>  deadtime                       120
>  initdead                       240
>
> Anyone see a problem with these settings?

Let's see with the above settings, how long it will take for the
cluster to detect a node failure.
[this is how understand these settings, so there is possibility I am wrong]
10:00:00 node1 receives a keepalive from node2
10:00:01 node2 is down
10:01:00 node1 issues the 1st warning
10:02:00 node1 detects that node2 is down

Is it acceptable in your environment that long node failure detection?

Cheers,
Pavlos
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to