Re: [modwsgi] Daemon process lifecycle; how are they killed? (and how can we debug if they aren't)

robert . waters Fri, 18 Nov 2016 04:13:56 -0800

Thanks Graham.

They look pretty normal:


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      2673  0.0  0.0  61272  3276 ?        Ss   04:09   0:01 
/usr/sbin/httpd.worker
apache    1201  0.1  0.0 739448  8668 ?        Sl   06:32   0:01 
/usr/sbin/httpd.worker
svcuser    12840  0.0  0.0 455436 22876 ?        Sl   05:19   0:03 
daemon-display-name
*svcuser    23339  0.0  0.0 237320  5392 ?        Sl   Nov17   0:00 
daemon-display-name  <-- orphan*

Note that we do *not* see the pids of our daemon workers in the apache log 
when it shuts down.  We only see the pids of non-modwsgi workers, for 
handling server-status et al.  So in above output we would see only pid 
1201 shutdown problems in httpd log.  

This issue has been around for a while, we have observed it here and there 
in the past, but recently it has amplified and is causing resource 
exhaustion and we're trying to answer 'why now' in addition to 'why'?


Appreciate the help.


On Thursday, November 17, 2016 at 11:20:12 PM UTC-5, Graham Dumpleton wrote:
>
>
> > On 18 Nov 2016, at 2:39 PM, [email protected] <javascript:> 
> wrote: 
> > 
> > Hello, 
> > 
> > We are having an issue using Apache/2.2.15 (Unix) mod_wsgi/3.3 
> Python/2.7.3 worker MPM/daemon mode, where apache restarts cause daemon 
> processes to become orphaned (adopt ppid 1 and continue to run app code but 
> not take http requests).   
> > 
> > Each time the error occurs, we will see something like: 
> > [Thu Nov 17 22:15:00 2016] [warn] child process 23371 still did not 
> exit, sending a SIGTERM 
> > [Thu Nov 17 22:15:02 2016] [warn] child process 23371 still did not 
> exit, sending a SIGTERM 
> > [Thu Nov 17 22:15:04 2016] [warn] child process 23371 still did not 
> exit, sending a SIGTERM 
> > [Thu Nov 17 22:15:06 2016] [error] child process 23371 still did not 
> exit, sending a SIGKILL 
> > 
> > .. where pid 23371 was an httpd worker. 
> > 
> > This causes me to assume that the root worker (initial process spawned 
> by httpd and owned by root) sends (TERM, TERM, TERM, KILL) to the 
> worker(s), which then attempts to kill the daemon processes but can't for 
> some reason and that causes it to not respond to it's parent's requests to 
> die.  However, this does not make sense to me because that worker is run by 
> low-privilege apache user which does not have ability to kill our daemon 
> processes (which have a different uid/gid).  We have tried permutations of 
> different users and privileges and nothing helps. 
> > 
> > We can easily send a TERM to any of the daemon processes manually 
> (orphaned or not), and they die cleanly in well under the 3 second window 
> that apache uses.  They die, and mod_wsgi emits something to the httpd log 
> saying they were aborted.  It just doesn't happen when httpd tries to do 
> it. 
> > 
> > We are using C modules, and we have enabled WSGIApplicationGroup 
> ${GLOBAL} and as far as we can tell our permissions and vhost configuration 
> is right.  The application works well at runtime. 
> > 
> > In order to continue to debug this, we were hoping to find out exactly 
> how the daemons are signaled that they should exit.  Tracing the daemon 
> processes with sysdig shows nothing about them getting any signals from 
> httpd to terminate.   
> > 
> > Any ideas or tips on how to put the pieces together? 
>
> The signals to shutdown should be sent by the Apache root process, which 
> runs as root. There is no way the daemon processes should be able to ignore 
> the SIGKILL. The only way the processes should be able to hang around is if 
> they became zombie processes because they were hung on some resource such 
> as an NFS mount. They will not actually be running in this case, only 
> occupying a slot in the process table and nothing more. 
>
> Really need to see the output of ‘ps auxwww’ so can see the pids, 
> relationship to other httpd processes and the process state and whether it 
> is a zombie (Z). 
>
> Overall not much can do to help as you are on an ancient Apache/mod_wsgi 
> version. From memory have seen some complaints of something similar before, 
> but they all revolved around the user of Apache 2.2.12-2.2.16. Never seen 
> anything similar since. So have always suspected some strange issue with 
> Apache around that version. 
>
> Graham 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] Daemon process lifecycle; how are they killed? (and how can we debug if they aren't)

Reply via email to