Re: [Linux-HA] network failure and recovery problem

Artur Kamiński Wed, 30 Sep 2009 10:30:05 -0700
Michael Hale pisze:
> On Tue, Sep 8, 2009 at 2:24 AM, Andrew Beekhof<[email protected]> wrote:
>   
>> 2009/9/6 Artur Kamiński <[email protected]>:
>>     
>>> Elo
>>>
>>> Somtimes my network works fail. If my netowrk down and up after few
>>> seconds the ha cluster not recovery. I do not want to set up the high
>>> deadtime. how to fix this ?
>>>       
>> How about reading the logs you posted?
>>
>> Go to http://linux-ha.org/FAQ , look for the word "load" and you'll end up at
>>   http://linux-ha.org/FAQ#head-f4baa92def24b4f4b9b4e1f16734f05547c872c7
>>
>> Not exactly rocket science
>>     
>
> It may not be rocket science, but that's not a very friendly response.
> Aren't we supposed to be encouraging collaboration communication and
> openness? The power of open source is a strong community. Shutting
> people down while showing off your knowledge of the software doesn't
> accomplish that.
>
>   
>>> on crm_mon
>>> Node: storage-2 (88d7ff6f-d400-40ef-a215-8cc7a6d29072): OFFLINE
>>> Node: storage-1 (7bce6375-3a7f-4ea1-9586-5b6f4c027190): online
>>>
>>> ha.cf:
>>> keepalive 1
>>> deadtime 10
>>>
>>> warntime 3
>>>
>>> initdead 15
>>> ....
>>> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
>>> apiauth ping gid=root uid=root
>>>
>>>
>>>
>>> version heartbeat-2_2.1.4-7~bpo50+1_all.deb
>>>
>>>
>>> on the log:
>>>
>>>
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: Link storage-2:eth1 up.
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: Link 10.1.131.65:10.1.131.65 up.
>>> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node
>>> 10.1.131.65: interval 97770 ms
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node
>>> 10.1.131.65: status ping
>>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status
>>> update: Ping node storage-2 now has status [up]
>>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>>> update: Ping node storage-2 now has status [up]
>>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status
>>> update: Ping node 10.1.131.65 now has status [up]
>>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>>> update: Ping node 10.1.131.65 now has status [up]
>>> pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes
>>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>>> update: Ping node 10.1.131.65 now has status [ping]
>>> pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes
>>> heartbeat[11791]: 2009/09/06_16:40:38 CRIT: Cluster node storage-2
>>> returning after partition.
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: For information on cluster
>>> partitions, See URL: http://linux-ha.org/SplitBrain
>>> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Deadtime value may be too small.
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: See FAQ for information on
>>> tuning deadtime.
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: URL:
>>> http://linux-ha.org/FAQ#heavy_load
>>> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node
>>> storage-2: interval 97760 ms
>>> heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node
>>> storage-2: status active
>>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>>> update: Ping node storage-2 now has status [active]
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>       
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>     
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
I no have problem with a high load only the network often falls to me 
and wants to run well heartbeat if such failure is over. After such an 
accident heartbeat often do not want to call in the cluster, and there 
are two masters. If tune high Deadtime case of failure the server will 
heartbeat waited a long time before switching
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] network failure and recovery problem

Reply via email to