Hi Satoshi-san, On Thu, Sep 25, 2008 at 07:39:13PM +0900, OKADA Satoshi wrote: > Hi Dejan, > > > Thank you for your reply. > >> Hi Satoshi-san, >> >> On Tue, Sep 09, 2008 at 04:31:25PM +0900, OKADA Satoshi wrote: >>> Hi, >>> >>> I got unexpected ERROR message when I tested Heartbeat process failure. >>> >>> ha.cf: >>> ----- >>> crm on >>> use_logd on >>> keepalive 1 >>> deadtime 10 >>> initdead 40 >>> warntime 5 >>> udpport 694 >>> bcast eth0 >>> node node01 >>> node node02 >>> watchdog /dev/watchdog >>> ----- >>> >>> heartbeat version: 2.1.4 >>> OS version: RHEL 5.1 >>> >>> The test procedure: >>> 1. start heartbeat >>> # /etc/init.d/heartbeat start >>> >>> 2. kill heartbeat process >>> # kill -9 <"heartbeat: write" or "heartbeat: read" process> >>> These processes are restarted. >>> >>> 3. stop heartbeat >>> # /etc/init.d/heartbeat stop >>> >>> I get ERROR message in this stop process. >>> ---- ha-log ----- >>> heartbeat[4632]: 2008/09/09_14:43:41 ERROR: Watchdog write >>> magic character failure: closing /dev/watchdog!: Bad file descriptor >>> heartbeat[4632]: 2008/09/09_14:43:41 ERROR: Watchdog close(2) >>> failed.: Bad file descriptor >>> ----------------- >>> >>> I think that this is the same cause as Bugzilla No.1702 and I make patch. >>> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1702 >>> >>> Please check attached patch. >> >> Sorry for the delay on this one. >> >> Your patch looks fine to me. Did you test it? > > > Yes. > > I tested some operations, and checked logs and resources > status by usingcrm_mon. I was not able to find the problem. > > > --- > the outline of test: > Two node (Active-Standby) > watchdog directive in ha.cf > resources:rscGroup(IPaddr, pgsq, Filesystem) > > 1. I tested the behavior of the Heartbeat when target processes did not > down. > Target processes are "FIFO reader", "write bcast", "read bcast", > "write ping" and "read ping". > 1-1 resources fails, and fail-over. > 1-2 ping communication fails, and fail-over. > 1-3 master control process killed, and node is rebooted by watchdog. > 1-4 run Heartbeat continuously for about one hour. > > 2. I tested the behavior of the Heartbeat when target processes down. > 2-1 target processes killed and restarted these processes. > Afterwards, resources fails, and fail-over. > 2-2 "read ping" and "write ping" processes killed. > Afterwards, ping communicatin fails and fail-over. > 2-3 Target process killed and restearted processes. > Afterwards, run Heartbeat continuously for about one hour. >
Just applied your patch. Cheers, Dejan > > Best Regards, > > OKADA Satoshi > NTT Open Source Software Center > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
