On 10-11-05 06:40 PM, Pavlos Parissis wrote:
> On 5 November 2010 20:32, mike<[email protected]>  wrote:
>    
>> Hi all,
>>
>> I'm running a simple MySQL cluster on a very heavily loaded LPAR and
>> experiencing some outages due to late heartbeat packets, Gmain timeouts
>> and so on.
>>      
> Before we look at the settings, do you know if keepalives are lost due
> to load on the network(NIC and/or switch) or due to a load on the
> system?
>
>    
>> I'd like to adjust these settings:
>>
>> # Thresholds (in seconds)
>>   keepalive                      1
>>   warntime                       6
>>   deadtime                       10
>>   initdead                       15
>>
>> I'm thnking I'd like to make it this:
>>
>> # Thresholds (in seconds)
>>   keepalive                      60
>>   warntime                       60
>>   deadtime                       120
>>   initdead                       240
>>
>> Anyone see a problem with these settings?
>>      
> Let's see with the above settings, how long it will take for the
> cluster to detect a node failure.
> [this is how understand these settings, so there is possibility I am wrong]
> 10:00:00 node1 receives a keepalive from node2
> 10:00:01 node2 is down
> 10:01:00 node1 issues the 1st warning
> 10:02:00 node1 detects that node2 is down
>
> Is it acceptable in your environment that long node failure detection?
>
> Cheers,
> Pavlos
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
Thanks for the reply. Keepalives are definetly lost due to load on the 
system. As far as length of time for node detection, 2 minutes is fne as 
this is a test environment.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to