Re: [modwsgi] Mod_wsgi daemon processes stopped and not restarted when maximum-requests reached

Greg Lowe Tue, 07 Jan 2014 12:07:22 -0800

Dredging up this old thread.

Till now we've been limping along waiting to upgrade Debian - and 
performing manual restarts whenever a process goes down.


We upgraded Debian a couple of days ago and were hoping this problem would 
be gone for good, but unfortunately have experienced it again.

Software versions:
apache2     2.2.22-13
libapache2-mod-wsgi    3.3-4

Current config:
WSGIScriptAlias / /home/XXXXX/scripts/XXXXX.wsgi
WSGIDaemonProcess epal user=XXXXX group=XXXXX processes=2 threads=25 
maximum-requests=0 shutdown-timeout=15 inactivity-timeout=60 
display-name=%{GROUP}
WSGIProcessGroup XXXXX

So I guess the next step is to remove the inactivity timeout and set up a 
cron job to periodically restart apache.

Cheers,
Greg.

On Saturday, January 5, 2013 10:18:17 PM UTC+13, Graham Dumpleton wrote:
>
> Okay, here is the history of this issue.
>
> I have seen reports of it probably less than a dozen times that I 
> recollect. If more, it would not be many more.
>
> Although it is mod_wsgi that triggers the shutdown of the daemon process, 
> it is process management code within the Apache Runtime Library that deals 
> with the consequences of the child processing disappearing. The only thing 
> mod_wsgi provides is some simple callback code to be notified when a 
> process has died and depending on the circumstances restart it. The only 
> case one shouldn't be restarted to replace one which has died is if Apache 
> is being restarted or shutdown.
>
> These two recent reports are the first time I have seen the issue brought 
> up for a long time. Previously, based on the fact that reports of the 
> problem went away, it was suspected that it was some Apache issue that was 
> fixed around Apache version 2.2.16/2.2.17 due to a change in the version of 
> the Apache Runtime Library being used at the time.
>
> Michael, in your case you say you are using Apache 2.2.3 and Greg you said 
> you were using stock Debian version. Debian has a history of not keeping up 
> with the latest versions of software, so depending on what Debian version 
> you are using, you could also be using an old version of Apache.
>
> So that is about all I can tell you at the moment. That is, that I would 
> suspect that it is going to be due to an older version of Apache being 
> used. The issue just hasn't been seen since about Apache 2.2.16/2.2.17.
>
> Sorry about the slow replies. Life is confusing right now and finding time 
> and motivation to keep on top of the few mod_wsgi questions that do come up 
> is proving challenging.
>
> Graham
>
>
>
> On 4 January 2013 10:06, Michael Bayer <[email protected] <javascript:>>wrote:
>
>> We're seeing behavior that also might be this same issue.   This is again 
>> a low-use server with a maximum-requests of 1000, which is set that way 
>> since we were observing slow memory leaks in one application.   We're on 
>> Apache 2.2.3, I'm not seeing here what version of mod_wsgi we're on (unless 
>> there's some way to divine that from the .so file).
>>
>> The symptom is, the server serves its last request, and the next one is a 
>> 500 server error.  The app server doesn't seem to play any role at that 
>> point and the error log has the error "Premature end of script headers: 
>> ".   It's not clear to me without asking the admins some more if the server 
>> no longer writes any more logs after that, as well as whether or not the 
>> daemon process is still present in the ps listing.
>>
>>
>>
>>
>> On Tuesday, October 2, 2012 1:40:16 AM UTC-4, Graham Dumpleton wrote:
>>
>>> Sorry for the slow reply, been on long holiday. 
>>>
>>> Why are you using maximum-requests and inactivity-timeout? 
>>>
>>> I would always recommended against using these options if there is no 
>>> real requirement as unnecessary restarts should always be avoided 
>>> because the startup costs of loading up fat Python web applications 
>>> can significantly impact server performance if your server is under 
>>> constant load. 
>>>
>>> IOW, the last thing you want to happen if you are under loading, 
>>> especially a load spike, is to restart the application due to hitting 
>>> maximum-requests. 
>>>
>>> Anyway, I don't recollect specifically a case where daemon process 
>>> wouldn't come back after being killed off. Are you absolutely sure, 
>>> from running 'ps' and checking process IDs that they didn't exist in 
>>> any state in the process table? Are the process perhaps still there 
>>> and haven't actually shutdown properly? 
>>>
>>> The message: 
>>>
>>> [Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting process 
>>> 'XXusername-omittedXX'. 
>>>
>>> indicates there certainly wasn't a clean shutdown, possibly because of 
>>> a threading dead lock or hung request thread. However, when you get 
>>> that message there is a C level call to exit() immediately after so 
>>> the process should die. 
>>>
>>> Technically calling exit() doesn't absolutely guarantee shutdown as a 
>>> C level atexit() handler could block but neither Apache or mod_wsgi 
>>> should be setting up any of those. 
>>>
>>> The problem with maximum-requests and inactivity-timeout is that they 
>>> are a self initiated restart and a graceful one at that. There is 
>>> therefore no absolute failsafe if exit() doesn't work as there would 
>>> if it was an Apache restart. In the case of an Apache restart, the 
>>> Apache parent process will send a SIGKILL to the child processes if 
>>> they haven't exited after 3 seconds. For a self initiated restart, the 
>>> parent process doesn't know it is restarting so if something prevents 
>>> the process from properly dying, then it will not be notified and 
>>> replace the process. 
>>>
>>> In summary, validate for sure whether the daemon process exist still 
>>> or not in the process table and in what state. 
>>>
>>> If they do still exist, try and use gdb as described in: 
>>>
>>> http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_
>>> Crashes_With_GDB 
>>>
>>> to work out where they are blocked. 
>>>
>>> That all said, don't recollect seeing this specific issue with 
>>> maximum-requests, have seen similar outcome of processes not handling 
>>> requests in the past. Generally this was always because of application 
>>> level code blocking exhausting threads. There were some unexplained 
>>> cases, but this seemed to vanish with no changes to mod_wsgi. There 
>>> was suspicion though at the time that for some reason the distro 
>>> Apache was causing problems. From memory this was for Apache in patch 
>>> version range around 15-17 of 2.2 and you are using 16. Is there a 
>>> reason why you can't upgrade to a newer Apache version than what you 
>>> are using? 
>>>
>>> Graham 
>>>
>>> On 24 September 2012 10:19, Greg Lowe <[email protected]> wrote: 
>>> > I've recently run into some stability issues with three django sites 
>>> that we 
>>> > host. These are running on Apache and Mod_wsgi, and are low traffic 
>>> sites. 
>>> > I’m more concerned with reliability than maximising throughput. 
>>> > 
>>> > I’ve included a summary of what has been observed so far below. I 
>>> would 
>>> > greatly appreciate any advice on how to diagnose and/or fix this 
>>> problem. 
>>> > 
>>> > The sites are configured to use daemon processes with a 
>>> maximum-requests 
>>> > setting. Apache is running in prefork mode. i.e. WSGIDaemonProcess 
>>> > usernamehere user=usernamehere group=usernamehere processes=2 
>>> threads=25 
>>> > maximum-requests=100 inactivity-timeout=60 display-name=%{GROUP} 
>>> > 
>>> > We’ve experienced a number of outages where both of the daemon 
>>> processes 
>>> > disappear, and are not recreated by mod_wsgi. Restarting Apache causes 
>>> them 
>>> > to be restarted correctly. 
>>> > 
>>> > Occasionally I’ve seen the sites being served from only a single 
>>> daemon 
>>> > process although it’s configured to use two. An apache restart fixes 
>>> this. 
>>> > 
>>> > Our server admin has managed to reproduce the problem on our test 
>>> > environment by running apache bench at around 100 requests per second 
>>> for 
>>> > ~70k requests. (He’s also done this with INFO level logging turned on 
>>> - see 
>>> > the log snippets at the end of the email.) 
>>> > 
>>> > These are the typical errors seen in the error log from an outage: 
>>> > 
>>> > Many of these: Script timed out before returning headers: django.wsgi 
>>> > 
>>> > Many of these: Premature end of script headers: django.wsgi 
>>> > 
>>> > Occasionally one of these: (4)Interrupted system call: mod_wsgi 
>>> (pid=19990): 
>>> > Unable to connect to WSGI daemon process 'usernamehere’' on 
>>> > '/var/run/apache2/wsgi.25099.19.3.sock' after multiple attempts. 
>>> > 
>>> > Occasionally one of these: (2)No such file or directory: mod_wsgi 
>>> > (pid=10877): Unable to connect to WSGI daemon process 'usernamehere' 
>>> on 
>>> > '/var/run/apache2/wsgi.5194.16.3.sock' after multiple attempts. 
>>> > 
>>> > 
>>> > One of the sites has been running for about a year, the others only a 
>>> few 
>>> > months. These problems first started a few weeks ago. 
>>> > 
>>> > The sites were migrated from a server running an older version of 
>>> Debian a 
>>> > little over a month ago. There was a few weeks in between the 
>>> migration and 
>>> > the problems appearing. So I’m fairly confident that we can attribute 
>>> these 
>>> > problems to the newer version of mod_wsgi, but due to the timing I 
>>> can’t be 
>>> > certain. 
>>> > 
>>> > Old server versions: 
>>> > 
>>> > Debian: Lenny 
>>> > 
>>> > Apache: apache2 2.2.9-10+lenny8, apache-mpm-prefork 2.2.9-10+lenny8 
>>> > 
>>> > mod_wsgi: libapache2-mod-wsgi 2.5-1~lenny1 
>>> > 
>>> > Django: 1.3.1 
>>> > 
>>> > 
>>> > Current server versions: 
>>> > 
>>> > Debian: Squeeze 
>>> > 
>>> > Apache: apache2 2.2.16-6+squeeze7, apache2-mpm-prefork 
>>> 2.2.16-6+squeeze7 
>>> > 
>>> > Mod_wsgi: libapache2-mod-wsgi 3.3-2 
>>> > 
>>> > Django: 1.3.1 
>>> > 
>>> > 
>>> > Our system admin has reproduced the problem on our test environment 
>>> with 
>>> > INFO level logging turned on. Here are his comments: 
>>> > 
>>> > I set the apache log level for the test site and ab'd it to death. 
>>> > 
>>> > The pertinent log entries are: 
>>> > 
>>> > [Fri Sep 21 09:10:22 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 70 (server XXXXXomittedXXXXXX:443) 
>>> > [Fri Sep 21 09:10:22 2012] [info] mod_wsgi (pid=3845): Maximum 
>>> requests 
>>> > reached 'XXusername-omittedXX'. 
>>> > [Fri Sep 21 09:10:22 2012] [info] mod_wsgi (pid=3845): Shutdown 
>>> > requested 'XXusername-omittedXX'. 
>>> > [Fri Sep 21 09:10:22 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 27 (server XXXXXomittedXXXXXX:443) 
>>> > .... 
>>> > [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 28 (server XXXXXomittedXXXXXX:443) 
>>> > [Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting 
>>> process 
>>> > 'XXusername-omittedXX'. 
>>> > [Fri Sep 21 09:10:23 2012] [error] [client 192.168.146.19] Premature 
>>> end 
>>> > of script headers: django.wsgi 
>>> > [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 85 (server XXXXXomittedXXXXXX:443) 
>>> > .... 
>>> > [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 46 (server XXXXXomittedXXXXXX:443) 
>>> > [Fri Sep 21 09:15:20 2012] [error] [client 192.168.146.19] Script 
>>> timed 
>>> > out before returning headers: django.wsgi 
>>> > [Fri Sep 21 09:15:20 2012] [error] [client 192.168.146.19] Script 
>>> timed 
>>> > out before returning headers: django.wsgi 
>>> > 
>>> > So it seems that there is an unclean shutdown after wsgi's max 
>>> requests 
>>> > count is reached. 
>>> > 
>>> > A normal shutdown looks like this: 
>>> > 
>>> > [Fri Sep 21 09:10:17 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 73 (server XXXXXomittedXXXXXX:443) 
>>> > [Fri Sep 21 09:10:17 2012] [info] mod_wsgi (pid=3691): Maximum 
>>> requests 
>>> > reached 'XXusername-omittedXX'. 
>>> > [Fri Sep 21 09:10:17 2012] [info] mod_wsgi (pid=3691): Shutdown 
>>> > requested 'XXusername-omittedXX'. 
>>> > [Fri Sep 21 09:10:17 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 79 (server XXXXXomittedXXXXXX:443) 
>>> > .... 
>>> > [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 27 (server XXXXXomittedXXXXXX:443) 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Stopping 
>>> process 
>>> > 'XXusername-omittedXX'. 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Destroying 
>>> > interpreters. 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Destroy 
>>> > interpreter 'XXXXXomittedXXXXXX|'. 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Cleanup 
>>> > interpreter ''. 
>>> > [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 61 (server XXXXXomittedXXXXXX:443) 
>>> > .... 
>>> > 
>>> > [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 19 (server XXXXXomittedXXXXXX:443) 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Terminating 
>>> Python. 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Python has 
>>> shutdown. 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3845): Attach 
>>> > interpreter ''. 
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3845): Create 
>>> > interpreter 'XXXXXomittedXXXXXX|'. 
>>> > [Fri Sep 21 09:10:18 2012] [info] [client 192.168.146.19] mod_wsgi 
>>> > (pid=3845, process='XXusername-omittedXX', 
>>> > application='XXXXXomittedXXXXXX|'): Loading WSGI script 
>>> > '/home/XXusername-omittedXX/scripts/django.wsgi'. 
>>> > [Fri Sep 21 09:10:19 2012] [info] Initial (No.1) HTTPS request 
>>> received 
>>> > for child 85 (server XXXXXomittedXXXXXX:443) 
>>> > 
>>> > 
>>> > Thanks for making all the way through this long post ;) 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups 
>>> > "modwsgi" group. 
>>> > To view this discussion on the web visit 
>>> > https://groups.google.com/d/msg/modwsgi/-/NSg2A39f7DYJ. 
>>> > To post to this group, send email to [email protected]. 
>>> > To unsubscribe from this group, send email to 
>>> > [email protected]. 
>>> > For more options, visit this group at 
>>> > http://groups.google.com/group/modwsgi?hl=en. 
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msg/modwsgi/-/Lc4lrAOyyvwJ.
>>
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> To unsubscribe from this group, send email to 
>> [email protected] <javascript:>.
>> For more options, visit this group at 
>> http://groups.google.com/group/modwsgi?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [modwsgi] Mod_wsgi daemon processes stopped and not restarted when maximum-requests reached

Reply via email to