Re: [modwsgi] Too many spawned processes of apache

Graham Dumpleton Sat, 10 Apr 2010 05:27:08 -0700

On 10 April 2010 03:52, Sharjeel Ahmed Qureshi
<[email protected]> wrote:
> I'm sending you the error logs of both the core as well as the app. Please
> find the files attached. They are over the past week and I did try changing
> the parameters a little bit in the meantime.
>
> As you'll notice, there are infact some socket connection timeout messages
> but they don't have a particular reason for appearing as such..
>
> Regards,
> Sharjeel
>


Discussion taken back to the list.

Can you send me what settings you are using for:

<IfModule mpm_prefork_module>
    StartServers          1
    MinSpareServers       1
    MaxSpareServers      10
    MaxClients          150
    MaxRequestsPerChild   0
</IfModule>

Is this server under a lot of load when you are restarting it. Unless
you have StartServers set very high, it seems to be starting a lot of
server child process on startup.

Also especially want to know if you are setting MaxRequestsPerChild.

Also, what is Timeout directive set to in Apache. By default it is 300 seconds.

In application file have the errors:

[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #1 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #2 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #3 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #4 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #5 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #6 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #7 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #8 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #9 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed, s
leeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #10 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed,
sleeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #11 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed,
sleeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #12 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed,
sleeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #13 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed,
sleeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Conn
ection attempt #14 to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' failed,
sleeping before retrying again.
[Thu Apr 01 02:36:39 2010] [error] [client 127.0.0.1] (111)Connection
refused: mod_wsgi (pid=24782): Unab
le to connect to WSGI daemon process 'seenreport' on
'/var/run/apache2/wsgi.24773.0.1.sock' after multipl
e attempts.

This is showing something is wrong for a start. What happens is that code has:

    apr_interval_time_t timer = apr_time_from_sec(0.1);

Then in a loop for each time it fails it does:

                apr_sleep(timer);
                if (timer < apr_time_from_sec(2))
                    timer *= 2;

So, it should back off progressively and sleep longer and longer
between attempts, with at most about 2 seconds at maximum. It should
bail out after 15 attempts specifically for connection refused
messages.

If you look at the log messages though, all the messages are within
the same second, so must be something wrong with code, but I can't see
what it is on first inspection.

The issue is why one would get connection refused and only
explaination for that is that the listener socket listen queue was
already full.

This would occur if all threads in all daemon process group processes
were busy and nothing able to accept new requests and the listen queue
had filled up with further pending requests. So, 100 concurrent
requests and more waiting. Alternatively the UNIX listener socket
itself was somehow broken completely.

Another oddity is this:

[Wed Apr 07 21:31:01 2010] [info] mod_wsgi (pid=5612): Cleanup interpreter ''.
[Wed Apr 07 21:31:01 2010] [info] mod_wsgi (pid=5612): Terminating Python.
[Wed Apr 07 21:31:02 2010] [info] mod_wsgi (pid=5595): Cleanup interpreter ''.
[Wed Apr 07 21:31:02 2010] [info] mod_wsgi (pid=5595): Terminating Python.
[Wed Apr 07 21:31:03 2010] [info] mod_wsgi (pid=4984): Cleanup interpreter ''.
[Wed Apr 07 21:31:03 2010] [info] mod_wsgi (pid=4984): Terminating Python.
[Wed Apr 07 21:31:04 2010] [info] mod_wsgi (pid=5596): Cleanup interpreter ''.
[Wed Apr 07 21:31:04 2010] [info] mod_wsgi (pid=5596): Terminating Python.
[Wed Apr 07 21:31:05 2010] [info] mod_wsgi (pid=4346): Cleanup interpreter ''.
[Wed Apr 07 21:31:05 2010] [info] mod_wsgi (pid=4346): Terminating Python.
[Wed Apr 07 21:31:06 2010] [info] mod_wsgi (pid=4347): Cleanup interpreter ''.
[Wed Apr 07 21:31:06 2010] [info] mod_wsgi (pid=4347): Terminating Python.
[Wed Apr 07 21:31:07 2010] [info] mod_wsgi (pid=5590): Cleanup interpreter ''.
[Wed Apr 07 21:31:07 2010] [info] mod_wsgi (pid=5590): Terminating Python.
[Wed Apr 07 21:31:08 2010] [info] mod_wsgi (pid=4390): Cleanup interpreter ''.
[Wed Apr 07 21:31:08 2010] [info] mod_wsgi (pid=4390): Terminating Python.
[Wed Apr 07 21:31:09 2010] [info] mod_wsgi (pid=5609): Cleanup interpreter ''.
[Wed Apr 07 21:31:09 2010] [info] mod_wsgi (pid=5609): Terminating Python.
[Wed Apr 07 21:31:10 2010] [info] mod_wsgi (pid=5610): Cleanup interpreter ''.
[Wed Apr 07 21:31:10 2010] [info] mod_wsgi (pid=5610): Terminating Python.
[Wed Apr 07 21:31:11 2010] [info] mod_wsgi (pid=4983): Cleanup interpreter ''.
[Wed Apr 07 21:31:11 2010] [info] mod_wsgi (pid=4983): Terminating Python.

These aren't daemon processes but Apache child server processes. You
will note they are one second apart. This is likely indicating that in
Apache server maintenance loop, which is once a second, it is
determining these processes are excess and killing them off.

You will see though at the same time:

[Wed Apr 07 21:31:03 2010] [error] [client 127.0.0.1] mod_wsgi
(pid=5614): Exception occurred processing WSGI script
'/home/www/seenreport.com/apache2/django.wsgi'.
[Wed Apr 07 21:31:03 2010] [error] [client 127.0.0.1] IOError: client
connection closed
[Wed Apr 07 21:31:03 2010] [error] [client 127.0.0.1] mod_wsgi
(pid=5614): Exception occurred processing WSGI script
'/home/www/seenreport.com/apache2/django.wsgi'.
[Wed Apr 07 21:31:03 2010] [error] [client 127.0.0.1] IOError: client
connection closed
[Wed Apr 07 21:31:05 2010] [error] [client 127.0.0.1] mod_wsgi
(pid=5614): Exception occurred processing WSGI script
'/home/www/seenreport.com/apache2/django.wsgi'.
[Wed Apr 07 21:31:05 2010] [error] [client 127.0.0.1] IOError: client
connection closed

This is on the daemon process side and possibly indicates that Apache
has killed off child server processes which were still actually
handling requests and so interrupted active requests.

Apache shouldn't as far as I know do that unless the scoreboard
accounting it does (nothing to do with mod_wsgi) is broken.

What patch revision of Apache 2.2 are you using. Is it the latest?

You are actually seeing:

[Wed Apr 07 21:30:42 2010] [info] mod_wsgi (pid=4317): Daemon process
deadlock timer expired, stopping process 'seenreport'.

quite a lot and many times a lot of the issues on server child process
are just before that, possibly because their timeouts based on Timeout
directive are occuring just before as both probably 300 seconds.

Since you are using MySQL.

[Thu Apr 01 05:49:44 2010] [error] Exception exceptions.TypeError:
"'NoneType' object is not callable" in <bound method Cursor.__del__ of
<MySQLdb.cursors.Cursor object at 0
x1c2be90>> ignored

I would make sure you are using latest MySQL Python wrapper.

Also make sure you are not running mod_php in same web server, because
if you are and it is loading MySQL support, the MySQL library it is
loading may not be compatible with what Python MySQL wrapper is
expecting. This can cause lots of issues, albeit usually crashes.

Overall, it still looks like a problem with Python C extension module.
Many of the errors are possibly coming about because of timeouts
occuring on communications between Apache server child processes and
daemon process group processes because of daemon processes being
locked up due to GIL deadlock.

Graham


> On Thu, Apr 8, 2010 at 6:04 PM, Graham Dumpleton
> <[email protected]> wrote:
>>
>> On 8 April 2010 22:35, Sharjeel Ahmed Qureshi
>> <[email protected]> wrote:
>> > There is no particular reason for using Prefork except it is the default
>> > with apache and I've heard it is safer than Worker. We are not using
>> > mod_php
>> > at all.
>>
>> Prefork is only required if using something like mod_php as it is
>> single threaded.
>>
>> If all you are using the Apache for is to use mod_wsgi and not doing
>> anything else, quite okay and much better to use worker MPM. There are
>> no reliability issues and that is multithreaded will not matter to
>> Python code as not running Python code in the Apache server child
>> processes which the MPM controls.
>>
>> > Earlier I was getting a warning that wsgi is compiled for Python 2.5.1
>> > but
>> > Python2.5.2 is being used; I fixed it by re-compiling mod_wsgi to an
>> > upgraded version.
>>
>> Shouldn't matter so long as Python using shared library.
>>
>> > My application uses a lot of packages. I'll have to make a list to see
>> > which
>> > of them use C extension.
>> >
>> > I'm not sure about that 5 minute window; when my application goes down
>> > it
>> > says down for hours until manually apache's spawned processes are killed
>> > and
>> > apache is restarted. If 5 minute is the default timeout, then atleast
>> > application should get back in some amount of time (be that hours).
>>
>> That is only the timeout for a Python GIL deadlock.
>>
>> The 'inactivity-timeout' option I mentioned is another more general
>> timeout value you can specify.
>>
>> What it does is that if no requests at all for the specified period,
>> will restart daemon process just to take memory back to base level.
>>
>> It also though monitors active requests and if all active requests
>> don't consume request content, or generate response content for the
>> specified period, will assume things are hung and trigger a process
>> restart.
>>
>> This latter situation should pick up your situation unless the whole
>> process is borked in some way to extent that even the pure C
>> background thread which detects these conditions cant run. It would
>> take something pretty bad for that too happen such as resource
>> exhaustion.
>>
>> If that doesn't help, then have seen one other case where for some
>> reason it looked like Linux itself was stuffing up. For the socket
>> connection to daemon process, Apache server process was seeing a
>> successful connect message, but listener in daemon process was never
>> seeing a socket connection request. I made a change in mod_wsgi 2.7
>> related to timeouts on that connect though to at least recover from
>> the problem. I would expect slightly different error messages though
>> if that was occuring. It could though result in the daemon process
>> appearing not to hang as seems that when process gets in that stuffed
>> state, that it never recovers and all requests simply keep failing
>> after the connect timeout.
>>
>> Are you seeing any error messages related to a socket timeout?
>>
>> Maybe if you can bundle up some complete examples of Apache error log
>> files and email them to my personal email I can look at it. That way
>> may pick up other error messages which you might not think are related
>> but are.
>>
>> Given that have heard of that odd behaviour once before, would be good
>> to track it down a bit more. Last time it was quite hard to even work
>> that much out as like know was relying on person with problem to give
>> me what they thought was important from logs. If I can get the whole
>> logs will help immensely as can look for all sorts of messages and
>> also look at the timing between different messages in error logs to
>> perhaps work out what is going on.
>>
>> > There is one more thing; for certain task we spawn a python thread which
>> > does some task in background (e.g. sending mail through Yahoo/GMail)
>> > while
>> > it immediately gives back response to the user. Could that be a problem?
>>
>> Hard to tell.
>>
>> Graham
>>
>> > On Thu, Apr 8, 2010 at 5:22 PM, Graham Dumpleton
>> > <[email protected]> wrote:
>> >>
>> >> On 8 April 2010 22:13, Sharjeel Ahmed Qureshi
>> >> <[email protected]> wrote:
>> >> > Graham,
>> >> >
>> >> > Thanks for the info. I have checked both logs, core as well as the
>> >> > vhost's
>> >> > logs. Yes, I've been lately getting this as well but they weren't
>> >> > appearing
>> >> > few days back, although problem has been appearing for the past many
>> >> > days:
>> >> >
>> >> > [info] mod_wsgi (pid=xxxxx): Daemon process deadlock timer expired,
>> >> > stopping
>> >> > process 'seenreport'.
>> >>
>> >> If you are getting that then it would normally mean that some Python
>> >> extension module (ie., C code), is blocking indefinitely but not
>> >> releasing the Python GIL while doing so. This causes all request
>> >> threads to lock up. By default this timeout occurs after 5 minutes and
>> >> is basically a fail safe to cope with buggy extension modules.
>> >>
>> >> What third party Python packages are you using which have C extension
>> >> modules as part of them?
>> >>
>> >> Another possible cause is mismatches in shared library versions used
>> >> by Apache, Apache modules and Python modules.
>> >>
>> >> As I asked before, are you loading mod_php into the same Apache. That
>> >> often causes shared library conflicts with different versions and/or
>> >> use of non thread safe versions of libraries.
>> >>
>> >> If you can catch the process in that 5 minute window while daemon
>> >> process is hung, you can use recipe at end of:
>> >>
>> >>
>> >>
>> >> http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB
>> >>
>> >> at work out where all the C threads are when it is stuck. This can
>> >> indicate which C extension module might be the problem.
>> >>
>> >> Graham
>> >>
>> >> > I did get some segmentation faults earlier but somehow they stopped
>> >> > appearing now. They were like this:
>> >> >
>> >> > [notice] seg fault or similar nasty error detected in the parent
>> >> > process
>> >> >
>> >> > Also, I get these messages:
>> >> >
>> >> > [error] server reached MaxClients setting, consider raising the
>> >> > MaxClients
>> >> > setting
>> >> >
>> >> > Prolly the last one is because all the processes get stucked and
>> >> > hence
>> >> > the
>> >> > server stops responding.
>> >> >
>> >> > I've changed the number of processes of WSGIDaemon now. Previously
>> >> > they
>> >> > were
>> >> > 1 and now they are 10. The config now looks like this:
>> >> >
>> >> > Here are my WSGI related lines:
>> >> >
>> >> >               WSGIScriptAlias /
>> >> > /home/www/site.com/apache2/django.wsgi
>> >> >               WSGIDaemonProcess snr user=iwitness group=iwitness
>> >> > processes=10 threads=10 display-name=apache-wsgi
>> >> >               WSGIProcessGroup snr
>> >> >               <Directory /home/www/site.com/apache2/django.wsgi>
>> >> >               Order allow,deny
>> >> >               Allow from all
>> >> >               </Directory>
>> >> >
>> >> > Sharjeel
>> >> >
>> >> > On Thu, Apr 8, 2010 at 2:36 PM, Graham Dumpleton
>> >> > <[email protected]> wrote:
>> >> >>
>> >> >> On 8 April 2010 00:02, Sharjeel Qureshi <[email protected]>
>> >> >> wrote:
>> >> >> > I am having problems with my Django Based application. The server
>> >> >> > stops responding occasionally, starts creating lot of processes
>> >> >> > until
>> >> >> > it is restarted. Even when I shutdown apache, lot of apache
>> >> >> > processes
>> >> >> > can be seen with "ps -aux | grep apache" command so I also have to
>> >> >> > do
>> >> >> > "sudo killall -v apache2".
>> >> >> >
>> >> >> > My setup is on a 64-bit Ubuntu 8.04 machine. Main webserver is
>> >> >> > nginx
>> >> >> > (which gives a 502 server timeout when it happens). It is
>> >> >> > connected
>> >> >> > to
>> >> >> > Apache 2.2 via reverse proxy. Apache is using MPM-Prefork and runs
>> >> >> > the
>> >> >> > mod_wsgi in daemon mode with a single process. The application is
>> >> >> > using django 1.1
>> >> >> >
>> >> >> > Here is my system configuration
>> >> >> >
>> >> >> > Ubuntu 6.04
>> >> >> > nginx for static content, reverse proxied with apache (nginx stays
>> >> >> > stable and gives 502 server timeout)
>> >> >> > Apache 2.2, mpm-prefork, maxclients=150
>> >> >> > Mod_wsgi 2.7, processes=1
>> >> >> > mod_python is NOT installed
>> >> >> > django 1.1
>> >> >> >
>> >> >> > I usually don't get more than 10 requests at each time but the
>> >> >> > process
>> >> >> > count becomes way higher than it should.
>> >> >> >
>> >> >> > I've changed the log level of both core apache as well as the
>> >> >> > virtual
>> >> >> > host. In the errors log I've found following:
>> >> >> >
>> >> >> > a) [error] [client 127.0.0.1] mod_wsgi (pid=23988): Exception
>> >> >> > occurred
>> >> >> > processing WSGI script
>> >> >> > '/home/www/example.com/apache2/django.wsgi'.
>> >> >> > b) [error] [client 127.0.0.1] IOError: failed to write data
>> >> >> > c) [info] [client 127.0.0.1] (32)Broken pipe: core_output_filter:
>> >> >> > writing data to the network
>> >> >> > d) [error] [client 127.0.0.1] (4)Interrupted system call: mod_wsgi
>> >> >> > (pid=24215): Unable to connect to WSGI daemon process 'snr' on
>> >> >> > '/var/
>> >> >> > run/apache2/wsgi.23609.0.1.sock' after multiple attempts.
>> >> >> > e) lots of: [error] [client 127.0.0.1] Premature end of script
>> >> >> > headers: django.wsgi
>> >> >> >
>> >> >> > I have similar setup running on three other instances but never
>> >> >> > had
>> >> >> > problems on them strangely!
>> >> >> >
>> >> >> > Any ideas?
>> >> >>
>> >> >> Are you ever seeing any of the messages:
>> >> >>
>> >> >>  Daemon process deadlock timer expired, stopping process ...
>> >> >>
>> >> >>  Segmentation Fault
>> >> >>
>> >> >> When you see the message:
>> >> >>
>> >> >>  Premature end of script headers
>> >> >>
>> >> >> are you seeing messages indicating that the daemon process has
>> >> >> restarted?
>> >> >>
>> >> >> Ensure you check both main Apache error log and any virtual host
>> >> >> specific error log for messages.
>> >> >>
>> >> >> At least while work this out, you can add option:
>> >> >>
>> >> >>  inactivity-timeout=120
>> >> >>
>> >> >> to WSGIDaemonProcess and should hopefully cause process to restart
>> >> >> automatically after being stuck for 2 minutes. At least then if you
>> >> >> can't get to it quickly yourself to restart Apache, it should fix
>> >> >> itself up.
>> >> >>
>> >> >> BTW, do you have a specific need to run prefork MPM? If you don't,
>> >> >> ie,
>> >> >> not running mod_php, you would be better off with worker MPM.
>> >> >>
>> >> >> Also post the actual existing WSGIDaemonProcess/WSGIProcessGroup
>> >> >> lines
>> >> >> you are using now.
>> >> >>
>> >> >> Graham
>> >> >>
>> >> >> --
>> >> >> You received this message because you are subscribed to the Google
>> >> >> Groups
>> >> >> "modwsgi" group.
>> >> >> To post to this group, send email to [email protected].
>> >> >> To unsubscribe from this group, send email to
>> >> >> [email protected].
>> >> >> For more options, visit this group at
>> >> >> http://groups.google.com/group/modwsgi?hl=en.
>> >> >>
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> >> > Groups
>> >> > "modwsgi" group.
>> >> > To post to this group, send email to [email protected].
>> >> > To unsubscribe from this group, send email to
>> >> > [email protected].
>> >> > For more options, visit this group at
>> >> > http://groups.google.com/group/modwsgi?hl=en.
>> >> >
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "modwsgi" group.
>> >> To post to this group, send email to [email protected].
>> >> To unsubscribe from this group, send email to
>> >> [email protected].
>> >> For more options, visit this group at
>> >> http://groups.google.com/group/modwsgi?hl=en.
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "modwsgi" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to
>> > [email protected].
>> > For more options, visit this group at
>> > http://groups.google.com/group/modwsgi?hl=en.
>> >
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/modwsgi?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] Too many spawned processes of apache

Reply via email to