Good day. I happened to drive into the following bug with CFEngine 2.x: when cfrun is used to initiate cfagents on the servers and it is spawning many children (maxchild is set), then the single cfrun instance that was forked to hail the specific server and run into a timeout, will kill the whole process group. So all cfrun horde, including the master process will be killed.
I am facing this issue when some of the nodes I'm maintaining with CFEngine were failed, so the connection isn't going to be established and cfrun will wait for it (virtually) indefinite. The problem is that ALARM_PID, being the uninitialized static variable, will be set to 0. So, when SIGALARM will be delivered to one of the children, it will issue 'kill(0, SIGTERM)' and this means "kill entire process group". The attached patch heals the situation. It works both on Linux/FreeBSD powered CFEngine servers running 2.2.9. The patch tries to initialize ALARM_PID to -1 in all relevant places, not only in globals.c. And another small patch that fixes a typo is attached too. -- Eygene Ryabinkin, Russian Research Centre "Kurchatov Institute"
From a5076606896c7183a47a9cf5ea131daf9fbdef09 Mon Sep 17 00:00:00 2001 From: Eygene Ryabinkin <r...@grid.kiae.ru> Date: Fri, 6 Feb 2009 16:28:09 +0300 Subject: [PATCH] Alarm signal fixes Some parts of CFEngine code install the SIGALARM handlers that are inspecting ALARM_PID variable and killing the respective PID when the signal is delivered. One should be careful to not have ALARM_PID of zero when the signal is delivered, because kill(0, SIGNAL) will act on the whole process group and this isn't desirable at all. So, we 1. must initialize ALARM_PID to -1 from the beginning (globals.c); 2. should not blindly set ALARM_PID = pid, where 'pid' is the result of 'fork()' call (popen.c); 3. initialize ALARM_PID back to -1 for the child process right after fork: there is no point in doing this: - when fork() will be followed by exec*() calls, there will be a time frame between those and alarm could come in this time frame; - when we will continue some activity inside the forked process, we won't need to duplicate parent's ALARM_PID -- parent will handle alarms by itself. Signed-off-by: Eygene Ryabinkin <r...@grid.kiae.ru> --- src/cfrun.c | 1 + src/do.c | 2 ++ src/globals.c | 2 +- src/misc.c | 1 + src/popen.c | 8 ++++---- 5 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/cfrun.c b/src/cfrun.c index 7620ab8..210d0fb 100644 --- a/src/cfrun.c +++ b/src/cfrun.c @@ -98,6 +98,7 @@ while (ip != NULL) { if (fork() == 0) /* child */ { + ALARM_PID = -1; printf("cfrun(%d): .......... [ Hailing %s ] ..........\n",i,ip->name); Debug("pid = %d i = %d\n", getpid(), i); diff --git a/src/do.c b/src/do.c index 9800cd6..155514c 100644 --- a/src/do.c +++ b/src/do.c @@ -1311,6 +1311,8 @@ for (ptr = VSCRIPT; ptr != NULL; ptr=ptr->next) { Verbose("Backgrounding job %s\n",execstr); outsourced = fork(); + if (outsourced == 0) + ALARM_PID = -1; } else { diff --git a/src/globals.c b/src/globals.c index 7f72d1d..49f2640 100644 --- a/src/globals.c +++ b/src/globals.c @@ -158,7 +158,7 @@ pthread_mutex_t MUTEX_LOCK = PTHREAD_MUTEX_INITIALIZER; PUBLIC char FORK = 'n'; PRIVATE int RPCTIMEOUT = 60; /* seconds */ - PUBLIC pid_t ALARM_PID; + PUBLIC pid_t ALARM_PID = -1; PROTECTED int SENSIBLEFILECOUNT = 2; PROTECTED int SENSIBLEFSSIZE = 1000; diff --git a/src/misc.c b/src/misc.c index f62652a..bee5288 100644 --- a/src/misc.c +++ b/src/misc.c @@ -270,6 +270,7 @@ if ((pid = fork()) < 0) } else if (pid == 0) /* child */ { + ALARM_PID = -1; if (useshell) { if (execl("/bin/sh","sh","-c",comm,NULL) == -1) diff --git a/src/popen.c b/src/popen.c index 61eed8a..3613904 100644 --- a/src/popen.c +++ b/src/popen.c @@ -104,7 +104,7 @@ if ((pid = fork()) == -1) signal(SIGCHLD,SIG_DFL); -ALARM_PID = pid; +ALARM_PID = (pid != 0 ? pid : -1); if (pid == 0) { @@ -246,7 +246,7 @@ if ((pid = fork()) == -1) return NULL; } -ALARM_PID = pid; +ALARM_PID = (pid != 0 ? pid : -1); if (pid == 0) { @@ -414,7 +414,7 @@ if ((pid = fork()) == -1) return NULL; } -ALARM_PID = pid; +ALARM_PID = (pid != 0 ? pid : -1); if (pid == 0) { @@ -533,7 +533,7 @@ if ((pid = fork()) == -1) return NULL; } -ALARM_PID = pid; +ALARM_PID = (pid != 0 ? pid : -1); if (pid == 0) { -- 1.6.1
From 772af652a3b78fca88e0d777c10db84a61166efc Mon Sep 17 00:00:00 2001 From: Eygene Ryabinkin <r...@grid.kiae.ru> Date: Fri, 6 Feb 2009 16:36:51 +0300 Subject: [PATCH] CFRun: fix typo Signed-off-by: Eygene Ryabinkin <r...@grid.kiae.ru> --- src/cfrun.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/cfrun.c b/src/cfrun.c index 210d0fb..02fa497 100644 --- a/src/cfrun.c +++ b/src/cfrun.c @@ -430,7 +430,7 @@ if (!gotkey) if (!RemoteConnect(parsed_host,forceipv4,port,strport)) { - snprintf(OUTPUT,CF_BUFSIZE,"No reponse hailing %s\n",parsed_host); + snprintf(OUTPUT,CF_BUFSIZE,"No response hailing %s\n",parsed_host); CfLog(cferror,OUTPUT,"socket"); FileOutput(fp, fopl_error,OUTPUT); if (CONN->sd != CF_NOT_CONNECTED) -- 1.6.1
_______________________________________________ Bug-cfengine mailing list Bug-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/bug-cfengine