On 10-11-05 06:40 PM, Pavlos Parissis wrote:
> On 5 November 2010 20:32, mike<[email protected]> wrote:
>
>> Hi all,
>>
>> I'm running a simple MySQL cluster on a very heavily loaded LPAR and
>> experiencing some outages due to late heartbeat packets, Gmain timeouts
>> and so on.
>>
> Before we look at the settings, do you know if keepalives are lost due
> to load on the network(NIC and/or switch) or due to a load on the
> system?
>
>
>> I'd like to adjust these settings:
>>
>> # Thresholds (in seconds)
>> keepalive 1
>> warntime 6
>> deadtime 10
>> initdead 15
>>
>> I'm thnking I'd like to make it this:
>>
>> # Thresholds (in seconds)
>> keepalive 60
>> warntime 60
>> deadtime 120
>> initdead 240
>>
>> Anyone see a problem with these settings?
>>
> Let's see with the above settings, how long it will take for the
> cluster to detect a node failure.
> [this is how understand these settings, so there is possibility I am wrong]
> 10:00:00 node1 receives a keepalive from node2
> 10:00:01 node2 is down
> 10:01:00 node1 issues the 1st warning
> 10:02:00 node1 detects that node2 is down
>
> Is it acceptable in your environment that long node failure detection?
>
> Cheers,
> Pavlos
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Thanks for the reply. Keepalives are definetly lost due to load on the
system. As far as length of time for node detection, 2 minutes is fne as
this is a test environment.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems