Re: [Linux-HA] network failure and recovery problem

Michael Hale Tue, 08 Sep 2009 09:30:25 -0700

On Tue, Sep 8, 2009 at 2:24 AM, Andrew Beekhof<[email protected]> wrote:
> 2009/9/6 Artur Kamiński <[email protected]>:
>> Elo
>>
>> Somtimes my network works fail. If my netowrk down and up after few
>> seconds the ha cluster not recovery. I do not want to set up the high
>> deadtime. how to fix this ?
>
> How about reading the logs you posted?
>
> Go to http://linux-ha.org/FAQ , look for the word "load" and you'll end up at
>   http://linux-ha.org/FAQ#head-f4baa92def24b4f4b9b4e1f16734f05547c872c7
>
> Not exactly rocket science


It may not be rocket science, but that's not a very friendly response.
Aren't we supposed to be encouraging collaboration communication and
openness? The power of open source is a strong community. Shutting
people down while showing off your knowledge of the software doesn't
accomplish that.

>
>>
>> on crm_mon
>> Node: storage-2 (88d7ff6f-d400-40ef-a215-8cc7a6d29072): OFFLINE
>> Node: storage-1 (7bce6375-3a7f-4ea1-9586-5b6f4c027190): online
>>
>> ha.cf:
>> keepalive 1
>> deadtime 10
>>
>> warntime 3
>>
>> initdead 15
>> ....
>> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
>> apiauth ping gid=root uid=root
>>
>>
>>
>> version heartbeat-2_2.1.4-7~bpo50+1_all.deb
>>
>>
>> on the log:
>>
>>
>> heartbeat[11791]: 2009/09/06_16:40:38 info: Link storage-2:eth1 up.
>> heartbeat[11791]: 2009/09/06_16:40:38 info: Link 10.1.131.65:10.1.131.65 up.
>> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node
>> 10.1.131.65: interval 97770 ms
>> heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node
>> 10.1.131.65: status ping
>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status
>> update: Ping node storage-2 now has status [up]
>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>> update: Ping node storage-2 now has status [up]
>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_lstatus_callback: Status
>> update: Ping node 10.1.131.65 now has status [up]
>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>> update: Ping node 10.1.131.65 now has status [up]
>> pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes
>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>> update: Ping node 10.1.131.65 now has status [ping]
>> pingd[11800]: 2009/09/06_16:40:38 info: send_update: 1 active ping nodes
>> heartbeat[11791]: 2009/09/06_16:40:38 CRIT: Cluster node storage-2
>> returning after partition.
>> heartbeat[11791]: 2009/09/06_16:40:38 info: For information on cluster
>> partitions, See URL: http://linux-ha.org/SplitBrain
>> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Deadtime value may be too small.
>> heartbeat[11791]: 2009/09/06_16:40:38 info: See FAQ for information on
>> tuning deadtime.
>> heartbeat[11791]: 2009/09/06_16:40:38 info: URL:
>> http://linux-ha.org/FAQ#heavy_load
>> heartbeat[11791]: 2009/09/06_16:40:38 WARN: Late heartbeat: Node
>> storage-2: interval 97760 ms
>> heartbeat[11791]: 2009/09/06_16:40:38 info: Status update for node
>> storage-2: status active
>> pingd[11800]: 2009/09/06_16:40:38 notice: pingd_nstatus_callback: Status
>> update: Ping node storage-2 now has status [active]
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] network failure and recovery problem

Reply via email to