On Mon, Mar 02, 2009 at 05:12:57PM +1100, Amos Shapira wrote:
> Hello,
> 
> Whenever we fail-over ldirectord we notice that the shut down process
> (on the node which just switched from "primary" to "secondary") leaves
> behind checking children which keep hammering the real servers for no
> benefit.
> 
> Here is a simple fix (hack?) to make the children realise that their
> parent is gone and they should exit. They simply check their parent
> process ID and if it's 1 (init) then it means that there is no reason
> for them to hang around.
> 
> My environment is CentOS 5.2 x86_64 running inside a Xen DomU with
> heartbeat-ldirectord-2.1.3-3.el5.centos, but the patch below is
> against 2.1.4 from
> http://hg.linux-ha.org/lha-2.1/archive/STABLE-2.1.4.tar.bz2
> 
> --- ldirectord/ldirectord.in.orig     2009-03-02 16:59:46.000000000 +1100
> +++ ldirectord/ldirectord.in  2009-03-02 17:03:41.000000000 +1100
> @@ -2311,6 +2311,11 @@
>                  service_set($v, $r, "down", {force => 1});
>          }
>          while (1) {
> +             if (getppid() == 1)
> +             {       
> +                     &ld_log("parent of $$ died; exiting\n");
> +                     exit 1;
> +             }
>                  foreach my $r (@$real) {
>                          $0 = "$virtual_id checking $$r{server}";
>                          _check_real($v, $r);
> 
> I'd be glad to hear whether you like this patch or think it's an ugly
> workaround which shouldn't be necessary.

Hi Amos,

Thanks for your patch. My initial reaction is that it shouldn't
be necessary as the child processes should receive a SIGKILL when
the parent exits - though clearly that isn't happening for some reason.

There has been a little bit of work on the child-process handling side
since 2.1.4. In particular:

http://hg.linux-ha.org/dev/rev/12165
http://hg.linux-ha.org/dev/rev/12021 <- this change turned out to be incomplete

Would it be possible for you to check to see if the problem
you observed still exists in more recent versions? It should be
possible to do this without upgrading all of linux-ha.

http://www.vergenet.net/linux/ldirectord/download.shtml#un-released

Getting back to your patch, a simple fix like that does look quite
appropriate for the lha-2.1 tree (and distros). But it would also
be very useful to get a handle on the status of the problem in
the dev tree.

Thanks

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to