Hi,
First of all, I'm new to both Linux-HA and the list - please be
indulgent :)
I'm trying to build a simple test cluster with two nodes and a single
resource (an ip address). My platform is Centos 5, heartbeat-2.1.3 - I
know it's an old branch, but I hope things haven't changed that much
since then.
My initial problem was a hb_gui that hung due to no response from mgmtd
on the tcp socket. But the real problem I ran into is "service hearbeat
stop" hanging indefinitely.
Tracing the "service heartbeat stop" command reveals that what actually
happens is a loop waiting for the master heartbeat process to quit. The
loop looks like this:
[pid 30029] nanosleep({1, 0}, {1, 0}) = 0
[pid 30029] kill(28429, SIG_0) = 0
[pid 30029] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 30029] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid 30029] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
Before the loop starts, the process is sent a SIGTERM:
[pid 30029] kill(28429, SIGTERM) = 0
... and it apparently ignores the signal.
The process looks like this:
[r...@stor-node1 ~]# ps u 28429
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 28429 0.0 4.5 12056 12056 ? Ss Sep09 0:00 heartbeat:
master control process
The process *can* be traced and it looks pretty much alive (trace output
can be provided, if relevant).
Any suggestion would be appreciated.
Thanks,
Radu Rendec
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems