Hi,

First of all, I'm new to both Linux-HA and the list - please be
indulgent :)

I'm trying to build a simple test cluster with two nodes and a single
resource (an ip address). My platform is Centos 5, heartbeat-2.1.3 - I
know it's an old branch, but I hope things haven't changed that much
since then.

My initial problem was a hb_gui that hung due to no response from mgmtd
on the tcp socket. But the real problem I ran into is "service hearbeat
stop" hanging indefinitely.

Tracing the "service heartbeat stop" command reveals that what actually
happens is a loop waiting for the master heartbeat process to quit. The
loop looks like this:
[pid 30029] nanosleep({1, 0}, {1, 0})   = 0
[pid 30029] kill(28429, SIG_0)          = 0
[pid 30029] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 30029] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid 30029] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0

Before the loop starts, the process is sent a SIGTERM:
[pid 30029] kill(28429, SIGTERM)        = 0
... and it apparently ignores the signal.

The process looks like this:
[r...@stor-node1 ~]# ps u 28429
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     28429  0.0  4.5  12056 12056 ?        Ss   Sep09   0:00 heartbeat: 
master control process

The process *can* be traced and it looks pretty much alive (trace output
can be provided, if relevant).

Any suggestion would be appreciated.

Thanks,

Radu Rendec


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to