2009/3/2 Simon Horman <[email protected]>: > On Mon, Mar 02, 2009 at 05:12:57PM +1100, Amos Shapira wrote: > Hi Amos, > > Thanks for your patch. My initial reaction is that it shouldn't > be necessary as the child processes should receive a SIGKILL when > the parent exits - though clearly that isn't happening for some reason.
That's what I though about my patch too - that it's a work around for a problem which shouldn't exist in the first place. > > There has been a little bit of work on the child-process handling side > since 2.1.4. In particular: > > http://hg.linux-ha.org/dev/rev/12165 > http://hg.linux-ha.org/dev/rev/12021 <- this change turned out to be > incomplete > > Would it be possible for you to check to see if the problem > you observed still exists in more recent versions? It should be > possible to do this without upgrading all of linux-ha. I'll try to test the first one in the office later today, but both patches seem to talk about zombies while the processes I see left behind I running just fine - it looks like they never received a signal to make them exit and are left behind in a running state with a init as a parent. All our checkers use the internal TCP-checking routine ("protocol=tcp"), some use "http" and some are plain sockets. We don't use external checkers right now. > http://www.vergenet.net/linux/ldirectord/download.shtml#un-released > > Getting back to your patch, a simple fix like that does look quite > appropriate for the lha-2.1 tree (and distros). But it would also > be very useful to get a handle on the status of the problem in > the dev tree. We'll try to help get to the bottom of this, but we are on limited resources (just me and another guy maintaining the entire company's growing network). BTW - now that I got your confirmation that this patch *looks* alright we might make it part of our automatic deployment until it gets fixed in the CentOS package (I reported this as a bug+patch for RedHat at https://bugzilla.redhat.com/show_bug.cgi?id=488013), in the meantime we use the following command line to get rid of all the TCP checkers left behind: # ps --ppid 1 -o pid=,ppid=,cmd= | awk '{ if ($3 ~ /^tcp:/) print $1}' | xargs -r -t /bin/kill We might start running it every few minutes from cron for now. > > Thanks Thank you for maintaining this package. > > -- > Simon Horman > VA Linux Systems Japan K.K., Sydney, Australia Satellite Office Cheers, --Amos _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
