Dredging up this old thread.
Till now we've been limping along waiting to upgrade Debian - and
performing manual restarts whenever a process goes down.
We upgraded Debian a couple of days ago and were hoping this problem would
be gone for good, but unfortunately have experienced it again.
Software versions:
apache2 2.2.22-13
libapache2-mod-wsgi 3.3-4
Current config:
WSGIScriptAlias / /home/XXXXX/scripts/XXXXX.wsgi
WSGIDaemonProcess epal user=XXXXX group=XXXXX processes=2 threads=25
maximum-requests=0 shutdown-timeout=15 inactivity-timeout=60
display-name=%{GROUP}
WSGIProcessGroup XXXXX
So I guess the next step is to remove the inactivity timeout and set up a
cron job to periodically restart apache.
Cheers,
Greg.
On Saturday, January 5, 2013 10:18:17 PM UTC+13, Graham Dumpleton wrote:
>
> Okay, here is the history of this issue.
>
> I have seen reports of it probably less than a dozen times that I
> recollect. If more, it would not be many more.
>
> Although it is mod_wsgi that triggers the shutdown of the daemon process,
> it is process management code within the Apache Runtime Library that deals
> with the consequences of the child processing disappearing. The only thing
> mod_wsgi provides is some simple callback code to be notified when a
> process has died and depending on the circumstances restart it. The only
> case one shouldn't be restarted to replace one which has died is if Apache
> is being restarted or shutdown.
>
> These two recent reports are the first time I have seen the issue brought
> up for a long time. Previously, based on the fact that reports of the
> problem went away, it was suspected that it was some Apache issue that was
> fixed around Apache version 2.2.16/2.2.17 due to a change in the version of
> the Apache Runtime Library being used at the time.
>
> Michael, in your case you say you are using Apache 2.2.3 and Greg you said
> you were using stock Debian version. Debian has a history of not keeping up
> with the latest versions of software, so depending on what Debian version
> you are using, you could also be using an old version of Apache.
>
> So that is about all I can tell you at the moment. That is, that I would
> suspect that it is going to be due to an older version of Apache being
> used. The issue just hasn't been seen since about Apache 2.2.16/2.2.17.
>
> Sorry about the slow replies. Life is confusing right now and finding time
> and motivation to keep on top of the few mod_wsgi questions that do come up
> is proving challenging.
>
> Graham
>
>
>
> On 4 January 2013 10:06, Michael Bayer <[email protected] <javascript:>>wrote:
>
>> We're seeing behavior that also might be this same issue. This is again
>> a low-use server with a maximum-requests of 1000, which is set that way
>> since we were observing slow memory leaks in one application. We're on
>> Apache 2.2.3, I'm not seeing here what version of mod_wsgi we're on (unless
>> there's some way to divine that from the .so file).
>>
>> The symptom is, the server serves its last request, and the next one is a
>> 500 server error. The app server doesn't seem to play any role at that
>> point and the error log has the error "Premature end of script headers:
>> ". It's not clear to me without asking the admins some more if the server
>> no longer writes any more logs after that, as well as whether or not the
>> daemon process is still present in the ps listing.
>>
>>
>>
>>
>> On Tuesday, October 2, 2012 1:40:16 AM UTC-4, Graham Dumpleton wrote:
>>
>>> Sorry for the slow reply, been on long holiday.
>>>
>>> Why are you using maximum-requests and inactivity-timeout?
>>>
>>> I would always recommended against using these options if there is no
>>> real requirement as unnecessary restarts should always be avoided
>>> because the startup costs of loading up fat Python web applications
>>> can significantly impact server performance if your server is under
>>> constant load.
>>>
>>> IOW, the last thing you want to happen if you are under loading,
>>> especially a load spike, is to restart the application due to hitting
>>> maximum-requests.
>>>
>>> Anyway, I don't recollect specifically a case where daemon process
>>> wouldn't come back after being killed off. Are you absolutely sure,
>>> from running 'ps' and checking process IDs that they didn't exist in
>>> any state in the process table? Are the process perhaps still there
>>> and haven't actually shutdown properly?
>>>
>>> The message:
>>>
>>> [Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting process
>>> 'XXusername-omittedXX'.
>>>
>>> indicates there certainly wasn't a clean shutdown, possibly because of
>>> a threading dead lock or hung request thread. However, when you get
>>> that message there is a C level call to exit() immediately after so
>>> the process should die.
>>>
>>> Technically calling exit() doesn't absolutely guarantee shutdown as a
>>> C level atexit() handler could block but neither Apache or mod_wsgi
>>> should be setting up any of those.
>>>
>>> The problem with maximum-requests and inactivity-timeout is that they
>>> are a self initiated restart and a graceful one at that. There is
>>> therefore no absolute failsafe if exit() doesn't work as there would
>>> if it was an Apache restart. In the case of an Apache restart, the
>>> Apache parent process will send a SIGKILL to the child processes if
>>> they haven't exited after 3 seconds. For a self initiated restart, the
>>> parent process doesn't know it is restarting so if something prevents
>>> the process from properly dying, then it will not be notified and
>>> replace the process.
>>>
>>> In summary, validate for sure whether the daemon process exist still
>>> or not in the process table and in what state.
>>>
>>> If they do still exist, try and use gdb as described in:
>>>
>>> http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_
>>> Crashes_With_GDB
>>>
>>> to work out where they are blocked.
>>>
>>> That all said, don't recollect seeing this specific issue with
>>> maximum-requests, have seen similar outcome of processes not handling
>>> requests in the past. Generally this was always because of application
>>> level code blocking exhausting threads. There were some unexplained
>>> cases, but this seemed to vanish with no changes to mod_wsgi. There
>>> was suspicion though at the time that for some reason the distro
>>> Apache was causing problems. From memory this was for Apache in patch
>>> version range around 15-17 of 2.2 and you are using 16. Is there a
>>> reason why you can't upgrade to a newer Apache version than what you
>>> are using?
>>>
>>> Graham
>>>
>>> On 24 September 2012 10:19, Greg Lowe <[email protected]> wrote:
>>> > I've recently run into some stability issues with three django sites
>>> that we
>>> > host. These are running on Apache and Mod_wsgi, and are low traffic
>>> sites.
>>> > I’m more concerned with reliability than maximising throughput.
>>> >
>>> > I’ve included a summary of what has been observed so far below. I
>>> would
>>> > greatly appreciate any advice on how to diagnose and/or fix this
>>> problem.
>>> >
>>> > The sites are configured to use daemon processes with a
>>> maximum-requests
>>> > setting. Apache is running in prefork mode. i.e. WSGIDaemonProcess
>>> > usernamehere user=usernamehere group=usernamehere processes=2
>>> threads=25
>>> > maximum-requests=100 inactivity-timeout=60 display-name=%{GROUP}
>>> >
>>> > We’ve experienced a number of outages where both of the daemon
>>> processes
>>> > disappear, and are not recreated by mod_wsgi. Restarting Apache causes
>>> them
>>> > to be restarted correctly.
>>> >
>>> > Occasionally I’ve seen the sites being served from only a single
>>> daemon
>>> > process although it’s configured to use two. An apache restart fixes
>>> this.
>>> >
>>> > Our server admin has managed to reproduce the problem on our test
>>> > environment by running apache bench at around 100 requests per second
>>> for
>>> > ~70k requests. (He’s also done this with INFO level logging turned on
>>> - see
>>> > the log snippets at the end of the email.)
>>> >
>>> > These are the typical errors seen in the error log from an outage:
>>> >
>>> > Many of these: Script timed out before returning headers: django.wsgi
>>> >
>>> > Many of these: Premature end of script headers: django.wsgi
>>> >
>>> > Occasionally one of these: (4)Interrupted system call: mod_wsgi
>>> (pid=19990):
>>> > Unable to connect to WSGI daemon process 'usernamehere’' on
>>> > '/var/run/apache2/wsgi.25099.19.3.sock' after multiple attempts.
>>> >
>>> > Occasionally one of these: (2)No such file or directory: mod_wsgi
>>> > (pid=10877): Unable to connect to WSGI daemon process 'usernamehere'
>>> on
>>> > '/var/run/apache2/wsgi.5194.16.3.sock' after multiple attempts.
>>> >
>>> >
>>> > One of the sites has been running for about a year, the others only a
>>> few
>>> > months. These problems first started a few weeks ago.
>>> >
>>> > The sites were migrated from a server running an older version of
>>> Debian a
>>> > little over a month ago. There was a few weeks in between the
>>> migration and
>>> > the problems appearing. So I’m fairly confident that we can attribute
>>> these
>>> > problems to the newer version of mod_wsgi, but due to the timing I
>>> can’t be
>>> > certain.
>>> >
>>> > Old server versions:
>>> >
>>> > Debian: Lenny
>>> >
>>> > Apache: apache2 2.2.9-10+lenny8, apache-mpm-prefork 2.2.9-10+lenny8
>>> >
>>> > mod_wsgi: libapache2-mod-wsgi 2.5-1~lenny1
>>> >
>>> > Django: 1.3.1
>>> >
>>> >
>>> > Current server versions:
>>> >
>>> > Debian: Squeeze
>>> >
>>> > Apache: apache2 2.2.16-6+squeeze7, apache2-mpm-prefork
>>> 2.2.16-6+squeeze7
>>> >
>>> > Mod_wsgi: libapache2-mod-wsgi 3.3-2
>>> >
>>> > Django: 1.3.1
>>> >
>>> >
>>> > Our system admin has reproduced the problem on our test environment
>>> with
>>> > INFO level logging turned on. Here are his comments:
>>> >
>>> > I set the apache log level for the test site and ab'd it to death.
>>> >
>>> > The pertinent log entries are:
>>> >
>>> > [Fri Sep 21 09:10:22 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 70 (server XXXXXomittedXXXXXX:443)
>>> > [Fri Sep 21 09:10:22 2012] [info] mod_wsgi (pid=3845): Maximum
>>> requests
>>> > reached 'XXusername-omittedXX'.
>>> > [Fri Sep 21 09:10:22 2012] [info] mod_wsgi (pid=3845): Shutdown
>>> > requested 'XXusername-omittedXX'.
>>> > [Fri Sep 21 09:10:22 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 27 (server XXXXXomittedXXXXXX:443)
>>> > ....
>>> > [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 28 (server XXXXXomittedXXXXXX:443)
>>> > [Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting
>>> process
>>> > 'XXusername-omittedXX'.
>>> > [Fri Sep 21 09:10:23 2012] [error] [client 192.168.146.19] Premature
>>> end
>>> > of script headers: django.wsgi
>>> > [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 85 (server XXXXXomittedXXXXXX:443)
>>> > ....
>>> > [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 46 (server XXXXXomittedXXXXXX:443)
>>> > [Fri Sep 21 09:15:20 2012] [error] [client 192.168.146.19] Script
>>> timed
>>> > out before returning headers: django.wsgi
>>> > [Fri Sep 21 09:15:20 2012] [error] [client 192.168.146.19] Script
>>> timed
>>> > out before returning headers: django.wsgi
>>> >
>>> > So it seems that there is an unclean shutdown after wsgi's max
>>> requests
>>> > count is reached.
>>> >
>>> > A normal shutdown looks like this:
>>> >
>>> > [Fri Sep 21 09:10:17 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 73 (server XXXXXomittedXXXXXX:443)
>>> > [Fri Sep 21 09:10:17 2012] [info] mod_wsgi (pid=3691): Maximum
>>> requests
>>> > reached 'XXusername-omittedXX'.
>>> > [Fri Sep 21 09:10:17 2012] [info] mod_wsgi (pid=3691): Shutdown
>>> > requested 'XXusername-omittedXX'.
>>> > [Fri Sep 21 09:10:17 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 79 (server XXXXXomittedXXXXXX:443)
>>> > ....
>>> > [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 27 (server XXXXXomittedXXXXXX:443)
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Stopping
>>> process
>>> > 'XXusername-omittedXX'.
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Destroying
>>> > interpreters.
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Destroy
>>> > interpreter 'XXXXXomittedXXXXXX|'.
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Cleanup
>>> > interpreter ''.
>>> > [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 61 (server XXXXXomittedXXXXXX:443)
>>> > ....
>>> >
>>> > [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 19 (server XXXXXomittedXXXXXX:443)
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Terminating
>>> Python.
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Python has
>>> shutdown.
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3845): Attach
>>> > interpreter ''.
>>> > [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3845): Create
>>> > interpreter 'XXXXXomittedXXXXXX|'.
>>> > [Fri Sep 21 09:10:18 2012] [info] [client 192.168.146.19] mod_wsgi
>>> > (pid=3845, process='XXusername-omittedXX',
>>> > application='XXXXXomittedXXXXXX|'): Loading WSGI script
>>> > '/home/XXusername-omittedXX/scripts/django.wsgi'.
>>> > [Fri Sep 21 09:10:19 2012] [info] Initial (No.1) HTTPS request
>>> received
>>> > for child 85 (server XXXXXomittedXXXXXX:443)
>>> >
>>> >
>>> > Thanks for making all the way through this long post ;)
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups
>>> > "modwsgi" group.
>>> > To view this discussion on the web visit
>>> > https://groups.google.com/d/msg/modwsgi/-/NSg2A39f7DYJ.
>>> > To post to this group, send email to [email protected].
>>> > To unsubscribe from this group, send email to
>>> > [email protected].
>>> > For more options, visit this group at
>>> > http://groups.google.com/group/modwsgi?hl=en.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msg/modwsgi/-/Lc4lrAOyyvwJ.
>>
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>.
>> For more options, visit this group at
>> http://groups.google.com/group/modwsgi?hl=en.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.