On Mon, Mar 02, 2009 at 05:12:57PM +1100, Amos Shapira wrote: > Hello, > > Whenever we fail-over ldirectord we notice that the shut down process > (on the node which just switched from "primary" to "secondary") leaves > behind checking children which keep hammering the real servers for no > benefit. > > Here is a simple fix (hack?) to make the children realise that their > parent is gone and they should exit. They simply check their parent > process ID and if it's 1 (init) then it means that there is no reason > for them to hang around. > > My environment is CentOS 5.2 x86_64 running inside a Xen DomU with > heartbeat-ldirectord-2.1.3-3.el5.centos, but the patch below is > against 2.1.4 from > http://hg.linux-ha.org/lha-2.1/archive/STABLE-2.1.4.tar.bz2 > > --- ldirectord/ldirectord.in.orig 2009-03-02 16:59:46.000000000 +1100 > +++ ldirectord/ldirectord.in 2009-03-02 17:03:41.000000000 +1100 > @@ -2311,6 +2311,11 @@ > service_set($v, $r, "down", {force => 1}); > } > while (1) { > + if (getppid() == 1) > + { > + &ld_log("parent of $$ died; exiting\n"); > + exit 1; > + } > foreach my $r (@$real) { > $0 = "$virtual_id checking $$r{server}"; > _check_real($v, $r); > > I'd be glad to hear whether you like this patch or think it's an ugly > workaround which shouldn't be necessary.
Hi Amos, Thanks for your patch. My initial reaction is that it shouldn't be necessary as the child processes should receive a SIGKILL when the parent exits - though clearly that isn't happening for some reason. There has been a little bit of work on the child-process handling side since 2.1.4. In particular: http://hg.linux-ha.org/dev/rev/12165 http://hg.linux-ha.org/dev/rev/12021 <- this change turned out to be incomplete Would it be possible for you to check to see if the problem you observed still exists in more recent versions? It should be possible to do this without upgrading all of linux-ha. http://www.vergenet.net/linux/ldirectord/download.shtml#un-released Getting back to your patch, a simple fix like that does look quite appropriate for the lha-2.1 tree (and distros). But it would also be very useful to get a handle on the status of the problem in the dev tree. Thanks -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
