Good day.

I happened to drive into the following bug with CFEngine 2.x: when cfrun
is used to initiate cfagents on the servers and it is spawning many
children (maxchild is set), then the single cfrun instance that was
forked to hail the specific server and run into a timeout, will kill the
whole process group.  So all cfrun horde, including the master process
will be killed.

I am facing this issue when some of the nodes I'm maintaining with
CFEngine were failed, so the connection isn't going to be established
and cfrun will wait for it (virtually) indefinite.  The problem is that
ALARM_PID, being the uninitialized static variable, will be set to 0.
So, when SIGALARM will be delivered to one of the children, it will
issue 'kill(0, SIGTERM)' and this means "kill entire process group".

The attached patch heals the situation.  It works both on Linux/FreeBSD
powered CFEngine servers running 2.2.9.  The patch tries to initialize
ALARM_PID to -1 in all relevant places, not only in globals.c.

And another small patch that fixes a typo is attached too.
-- 
Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute"
From a5076606896c7183a47a9cf5ea131daf9fbdef09 Mon Sep 17 00:00:00 2001
From: Eygene Ryabinkin <r...@grid.kiae.ru>
Date: Fri, 6 Feb 2009 16:28:09 +0300
Subject: [PATCH] Alarm signal fixes

Some parts of CFEngine code install the SIGALARM handlers that are
inspecting ALARM_PID variable and killing the respective PID when
the signal is delivered.  One should be careful to not have ALARM_PID
of zero when the signal is delivered, because kill(0, SIGNAL) will
act on the whole process group and this isn't desirable at all.

So, we
 1. must initialize ALARM_PID to -1 from the beginning (globals.c);
 2. should not blindly set ALARM_PID = pid, where 'pid' is the result
    of 'fork()' call (popen.c);
 3. initialize ALARM_PID back to -1 for the child process right after fork:
    there is no point in doing this:
    - when fork() will be followed by exec*() calls, there will be a time
      frame between those and alarm could come in this time frame;
    - when we will continue some activity inside the forked process,
      we won't need to duplicate parent's ALARM_PID -- parent will handle
      alarms by itself.

Signed-off-by: Eygene Ryabinkin <r...@grid.kiae.ru>
---
 src/cfrun.c   |    1 +
 src/do.c      |    2 ++
 src/globals.c |    2 +-
 src/misc.c    |    1 +
 src/popen.c   |    8 ++++----
 5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/cfrun.c b/src/cfrun.c
index 7620ab8..210d0fb 100644
--- a/src/cfrun.c
+++ b/src/cfrun.c
@@ -98,6 +98,7 @@ while (ip != NULL)
       {
       if (fork() == 0) /* child */
          {
+         ALARM_PID = -1;
          printf("cfrun(%d):         .......... [ Hailing %s ] ..........\n",i,ip->name);
          Debug("pid = %d i = %d\n", getpid(), i);
          
diff --git a/src/do.c b/src/do.c
index 9800cd6..155514c 100644
--- a/src/do.c
+++ b/src/do.c
@@ -1311,6 +1311,8 @@ for (ptr = VSCRIPT; ptr != NULL; ptr=ptr->next)
          {
          Verbose("Backgrounding job %s\n",execstr);
          outsourced = fork();
+         if (outsourced == 0)
+            ALARM_PID = -1;
          }
       else
          {
diff --git a/src/globals.c b/src/globals.c
index 7f72d1d..49f2640 100644
--- a/src/globals.c
+++ b/src/globals.c
@@ -158,7 +158,7 @@ pthread_mutex_t MUTEX_LOCK = PTHREAD_MUTEX_INITIALIZER;
   PUBLIC char FORK = 'n';
 
   PRIVATE   int RPCTIMEOUT = 60;          /* seconds */
-  PUBLIC    pid_t ALARM_PID;
+  PUBLIC    pid_t ALARM_PID = -1;
   PROTECTED int SENSIBLEFILECOUNT = 2;
   PROTECTED int SENSIBLEFSSIZE = 1000;
 
diff --git a/src/misc.c b/src/misc.c
index f62652a..bee5288 100644
--- a/src/misc.c
+++ b/src/misc.c
@@ -270,6 +270,7 @@ if ((pid = fork()) < 0)
    }
 else if (pid == 0)                     /* child */
    {
+   ALARM_PID = -1;
    if (useshell)
       {
       if (execl("/bin/sh","sh","-c",comm,NULL) == -1)
diff --git a/src/popen.c b/src/popen.c
index 61eed8a..3613904 100644
--- a/src/popen.c
+++ b/src/popen.c
@@ -104,7 +104,7 @@ if ((pid = fork()) == -1)
 
 signal(SIGCHLD,SIG_DFL);
 
-ALARM_PID = pid;
+ALARM_PID = (pid != 0 ? pid : -1);
 
 if (pid == 0)
    {
@@ -246,7 +246,7 @@ if ((pid = fork()) == -1)
    return NULL;
    }
 
-ALARM_PID = pid;
+ALARM_PID = (pid != 0 ? pid : -1);
 
 if (pid == 0)
    {
@@ -414,7 +414,7 @@ if ((pid = fork()) == -1)
    return NULL;
    }
 
-ALARM_PID = pid;
+ALARM_PID = (pid != 0 ? pid : -1);
 
 if (pid == 0)
    {
@@ -533,7 +533,7 @@ if ((pid = fork()) == -1)
    return NULL;
    }
 
-ALARM_PID = pid;
+ALARM_PID = (pid != 0 ? pid : -1);
 
 if (pid == 0)
    {
-- 
1.6.1

From 772af652a3b78fca88e0d777c10db84a61166efc Mon Sep 17 00:00:00 2001
From: Eygene Ryabinkin <r...@grid.kiae.ru>
Date: Fri, 6 Feb 2009 16:36:51 +0300
Subject: [PATCH] CFRun: fix typo

Signed-off-by: Eygene Ryabinkin <r...@grid.kiae.ru>
---
 src/cfrun.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/cfrun.c b/src/cfrun.c
index 210d0fb..02fa497 100644
--- a/src/cfrun.c
+++ b/src/cfrun.c
@@ -430,7 +430,7 @@ if (!gotkey)
  
 if (!RemoteConnect(parsed_host,forceipv4,port,strport))
    {
-   snprintf(OUTPUT,CF_BUFSIZE,"No reponse hailing %s\n",parsed_host);
+   snprintf(OUTPUT,CF_BUFSIZE,"No response hailing %s\n",parsed_host);
    CfLog(cferror,OUTPUT,"socket");
    FileOutput(fp, fopl_error,OUTPUT);
    if (CONN->sd != CF_NOT_CONNECTED)
-- 
1.6.1

_______________________________________________
Bug-cfengine mailing list
Bug-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/bug-cfengine

Reply via email to