Re: [Linux-ha-dev] Heartbeat process failure and log message

Dejan Muhamedagic Thu, 25 Sep 2008 04:29:25 -0700

Hi Satoshi-san,

On Thu, Sep 25, 2008 at 07:39:13PM +0900, OKADA Satoshi wrote:
> Hi Dejan,
>
>
> Thank you for your reply.
>
>> Hi Satoshi-san,
>>
>> On Tue, Sep 09, 2008 at 04:31:25PM +0900, OKADA Satoshi wrote:
>>> Hi,
>>>
>>> I got unexpected ERROR message when I tested Heartbeat process failure.
>>>
>>> ha.cf:
>>> -----
>>> crm on
>>> use_logd on
>>> keepalive 1
>>> deadtime 10
>>> initdead 40
>>> warntime 5
>>> udpport 694
>>> bcast eth0
>>> node node01
>>> node node02
>>> watchdog /dev/watchdog
>>> -----
>>>
>>> heartbeat version: 2.1.4
>>> OS version: RHEL 5.1
>>>
>>> The test procedure:
>>> 1. start heartbeat
>>> # /etc/init.d/heartbeat start
>>>
>>> 2. kill heartbeat process
>>> # kill -9 <"heartbeat: write" or "heartbeat: read" process>
>>> These processes are restarted.
>>>
>>> 3. stop heartbeat
>>> # /etc/init.d/heartbeat stop
>>>
>>> I get ERROR message in this stop process.
>>> ---- ha-log -----
>>> heartbeat[4632]: 2008/09/09_14:43:41 ERROR: Watchdog write
>>> magic character failure: closing /dev/watchdog!: Bad file descriptor
>>> heartbeat[4632]: 2008/09/09_14:43:41 ERROR: Watchdog close(2)
>>> failed.: Bad file descriptor
>>> -----------------
>>>
>>> I think that this is the same cause as Bugzilla No.1702 and I make patch.
>>> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1702
>>>
>>> Please check attached patch.
>>
>> Sorry for the delay on this one.
>>
>> Your patch looks fine to me. Did you test it?
>
>
> Yes.
>
> I tested some operations, and checked logs and resources
> status by usingcrm_mon. I was not able to find the problem.
>
>
> ---
> the outline of test:
>  Two node (Active-Standby)
>  watchdog directive in ha.cf
>  resources:rscGroup(IPaddr, pgsq, Filesystem)
>
>   1. I tested the behavior of the Heartbeat when target processes did not 
> down.
>     Target processes are "FIFO reader", "write bcast", "read bcast",
>     "write ping" and "read ping".
>     1-1 resources fails, and fail-over.
>     1-2 ping communication fails, and fail-over.
>     1-3 master control process killed, and node is rebooted by watchdog.
>     1-4 run Heartbeat continuously for about one hour.
>
>   2. I tested the behavior of the Heartbeat when target processes down.
>     2-1 target processes killed and restarted these processes.
>         Afterwards, resources fails, and fail-over.
>     2-2 "read ping" and "write ping" processes killed.
>         Afterwards, ping communicatin fails and fail-over.
>     2-3 Target process killed and restearted processes.
>         Afterwards, run Heartbeat continuously for about one hour.
>


Just applied your patch.

Cheers,

Dejan

>
> Best Regards,
>
> OKADA Satoshi
> NTT Open Source Software Center
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Heartbeat process failure and log message

Reply via email to