Iain,

many thanks for your work on this. This is a big patch so I will study 
it carefully before applying it.

M

Iain Patterson wrote:
>   I think I have a fix for the MAXFD issue.  Or at least one cause of 
> it.  What appears to be happening is:
> 
>   cfexecd calls MailResult() to decide whether to send an email after 
> fork()ing cfagent.  Control reaches FileChecksum() at line 714 of 
> cfexecd.c.
> 
> -----BEGIN SNIPPET-----
>   switch (type)
>      {
>      case 's': md = EVP_get_digestbyname("sha");
>           break;
>      case 'm': md = EVP_get_digestbyname("md5");
>           break;
>      default: FatalError("Software failure in ChecksumFile");
>      }
> 
>   if (!md)
>      {
>      return 0;
>      }
> -----END SNIPPET-----
> 
>   Under the debugger I saw EVP_get_digestbyname() return NULL and 
> ChecksumFile() return 0.  Quite why this is I don't know.  The upshot is 
> that the filehandle opened to perform the digest calculation is not closed.
> 
>   This results in file descriptor leakage.  With lsof I could see 
> cfexecd holding open these log files until eventually (after sixteen or 
> so scheduled cfagent runs) the number of open filedescriptors exceeded 
> MAXFD (20).
> 
>   On the next run LocalExec() calls cfpopen() to spawn cfagent. 
> cfpopen() fork()s a child, which exec()s cfagent.  The parent then 
> reaches line 191 of popen.c.
> 
> -----BEGIN SNIPPET-----
>    if (fileno(pp) >= MAXFD)
>       {
>       snprintf(OUTPUT,CF_BUFSIZE,"File descriptor %d of child %d higher 
> than MAXFD, check for defunct children", fileno(pp), pid);
>       CfLog(cferror,OUTPUT,"");
>       return NULL;
>       }
>    else
>       {
>       CHILD[fileno(pp)] = pid;
>       return pp;
>       }
> -----END SNIPPET-----
> 
>   cfopen() has returned a new file pointer to LocalExec(), which handles 
> this gracefully, aborting the LocalExec() run.  But it therefore never 
> gets to cfpclose() and hence doesn't wait() for the cfagent child, which 
> zombies.
> 
>   The attached patch fixes the problem by being more careful when MAXFD 
> is exceeded and also by eliminating the particular cause of leakage that 
> I saw.
> 
>   Firstly FileChecksum() closes the new filehandle before returning 0 in 
> the case where EVP_get_digestbyname() fails.
> 
>   In popen.c I've moved the wait()/waitpid() code to a new function, 
> cfpwait(pid_t) and called it not only from the original location in 
> cfpclose() but also before returning NULL from cfpopen*() if the fork() 
> has already completed.
> 
>   Even if MAXFD did still end up being exceeded, for example because 
> there is another cause of leakage I didn't see, cfpclose() will now wait 
> for "pid" -1, ie any child, and so not leave a zombie.  It can't look up 
> the specific pid from CHILD[fd] as fd is necessarily outside the array 
> bounds.
> 
>   However, there may be more to this.  A cursory grep of the source tree 
> shows that the return value of cfpclose() is thrown away everywhere 
> except for within package.c.  While moving stuff into cfpwait() I 
> noticed something a bit odd.  The code looks like this:
> 
> -----BEGIN SNIPPET-----
> #ifdef HAVE_WAITPID
> 
> while (waitpid(pid, &status, 0) < 0
> ...
> return status;
> 
> #else
> 
> while ((wait_result = wait(&status)) != pid)
> ...
> if (WIFSIGNALED(status))
> ...
> if (! WIFEXITED(status))
> ...
> return (WEXITSTATUS(status));
> #endif
> -----END SNIPPET-----
> 
>   I suspect the #endif should come above the WIFSIGNALED() conditional.  
> I haven't moved it, however, as the return code is checked in, eg 
> FreeBSDPackageList().  Nonetheless it may be, given that people were 
> seeing funky behaviour with RPM packages, that the return value of 
> cfpopen() was causing trouble.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bug-cfengine mailing list
> [email protected]
> https://cfengine.org/mailman/listinfo/bug-cfengine

-- 


Mark Burgess

Web: http://www.iu.hio.no/~mark
Tlf: +47 22453272
_______________________________________________
Bug-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/bug-cfengine

Reply via email to