Re: [modwsgi] Django, python, mod_wsgi and Apache worker

Graham Dumpleton Fri, 05 Oct 2012 03:43:18 -0700

Sorry for the slow reply. Been trying to catch up with stuff after my
extended holiday and the reply to this is going to be a bit involved.


First thing want to comment on is:

> Here's my apache config, running worker mpm.
>
> StartServers        10
> MaxClients         400
> MinSpareThreads     25
> MaxSpareThreads     75
> ThreadsPerChild     25
> MaxRequestsPerChild  0
>
> I started out with the defaults (StatServers=2 and MaxClients=150) but our 
> site slowed way down under minimal load. I'm guessing it took a long time to 
> spin up servers as requests came in. We're serving 90% of our media from s3. 
> The other 10% are served through Apache on our https pages or someone 
> pointing lazily to our local server. At nominal load, 15 worker processes end 
> up being created, so I'm thinking I should probably just set StartServers=15? 
> With this configuration I'm assuming I have 15 worker processes running 
> (which I can confirm with NewRelic) with 25 threads each (which I don't know 
> how to confirm, guessing 400/15).

Since you are using daemon mode of mod_wsgi there are a few things
wrong about what you are saying here.

In daemon mode, your WSGI application is going to be running in a
separate set of processes. The settings above pertain to the Apache
child worker processes and not the daemon process which the WSGI
application is running in. The Apache child worker processes, in the
case of when daemon mode is used, are only going to serve the purpose
of proxying the web requests to the separate daemon mode processes in
which the WSGI application is actually running. The forking of
additional Apache child worker process should be fast because no
application needs to be loaded, although because of the way works out
whether to create more processes, can be slightly delayed, but not if
load is minimal.

As to New Relic confirming number of processes, what you are saying
can't be right there either, as the New Relic Python agent will be
running in the daemon processes along with your WSGI application,
again separate to the Apache child worker processes that the above
directives control. The number of instances you see reporting under
New Relic is going to be dictated by the processes option to
WSGIDaemonProcess directives.

For the specific values you have changed to using for the directives
above there are also some problems. You at least didn't break one of
the major rules, which is that MinSpareThreads and MaxSpareThreads
should be a multiple of ThreadsPerChild, but there is one issue.
Whether you are hit by this issue does though depend a bit on the load
on your server at time Apache is started/restarted.

To try and illustrate, see the charts at:

https://skitch.com/grahamdumpleton/es5mq/figure-1

In available processes, you can see how a StartServers value of 10
results in 10 processes being created initially.

This value for StartServers is potentially in conflict however with
MaxSpareThreads though. If you had no load at all on your server, when
the process maintenance cycle of Apache starts up, the 10 processes
represent 250 threads, which is greater than the 75 maximum allowed
for spare threads.

As a result, Apache will start killing off the processes at a rate of
1 per second. Only stopping when it reaches 3 processes.

For a server that is typically only lightly loaded, this means you are
creating more processes than you need, which will very quickly be
killed off again.

When server load does increase, as indicated by the number of
concurrent requests, you can see how Apache will start to create more
processes. As load decreases, it will start killing off processes it
doesn't need. Eventually if you return to an idle state you will end
up back at 3 processes, corresponding to 75 maximum idle threads.

So you have a comparison, I will show you what you get for a couple of
other configurations.

StartServers          2
MaxClients          150
MinSpareThreads      25
MaxSpareThreads      75
ThreadsPerChild      25

This is the standard configuration supplied in extra/httpd-mpm.conf
for worker in an Apache Software Foundation distribution of Apache.
Although in the configuration file, for whatever reason that file
isn't included by default and so it isn't used. The result of it
however is:

https://skitch.com/grahamdumpleton/es5cr/figure-1

The configuration here is such that it starts a number of process
where the number of threads falls between minimum and maximum spare
threads. That way if no load the number of processes stays as it was
at startup and processes don't get killed off straight away.

If load increases and so the number of processes goes over what
corresponds to maximum spare threads, then when the load drops, it
will start to kill off the extra processes, but only to point of the
number of processes to make up that maximum spare threads.

So in effect, MinSpareThreads is what controls how quickly additional
processes are started up as load increases. With MaxSpareThreads
representing a lower bound on the number of processes, except to the
extent that it there can initially be less processes than that lower
bound dictates, until Apache sees by way of the load that they are
actually required.

In your case therefore, if you really wanted to start up 15 servers
initially, and didn't want them to be shutdown if not actually
required, you would need to use:

StartServers        15
MaxClients         400
MinSpareThreads     25
MaxSpareThreads    375
ThreadsPerChild     25

which would yield:

https://skitch.com/grahamdumpleton/eshyg/figure-1

The next issue is with:

> My apache/mod_wsgi directives look like this:
>
> <VirtualHost *:80>
>     # Some stuff
>     WSGIDaemonProcess app1 user=http group=http processes=10 threads=20
>     WSGIProcessGroup app1
>     WSGIApplicationGroup app1
>     WSGIScriptAlias / /path/to/django.wsgi
>     WSGIImportScript /path/to/django.wsgi process-group=app1 
> application-group=app1
>     # Some more stuff
> </VirtualHost>
> <VirtualHost *:443>
>     # Some stuff
>     WSGIDaemonProcess app1-ssl user=http group=http processes=2 threads=20
>     WSGIProcessGroup app1-ssl
>     WSGIApplicationGroup app1-ssl
>     WSGIScriptAlias / /path/to/django.wsgi
>     WSGIImportScript /path/to/django.wsgi process-group=app1-ssl 
> application-group=app1-ssl
>     # Some more stuff
> </VirtualHost>
> Having a different WSGIDaemonProcess/WSGIProcessGroup for the ssl side of my 
> site, well, that just doesn't feel right at all. I'm 100% sure I've mucked 
> something up here. To the greater point though, I've allocated 200+40 threads 
> for mod_wsgi to handle requests from Apache, leaving 160 threads to deal with 
> whatever media needs to be delivered up (through ssl or laziness of not 
> pointing to s3).

Unless you have a specific requirement to segregate code running under
an SSL request from a non SSL request, then you should be configuring
the SSL to use the same daemon process group.

You have two choices there. The first is to move the WSGIDaemonProcess
directive outside of the VirtualHost and have WSGIProcessGroup of both
VirtualHost's refer to the one daemon process group.

This does allow arbitrary VirtualHost's to delegate WSGI applications
to that daemon process group however. Instead you can therefore define
the WSGIDaemonProcess directive in the first of the two VirtualHost's
and refer to it from the other.

This second is possible because mod_wsgi will specifically allow a
WSGIProcessGroup to reach across to a daemon process group defined by
WSGIDaemonProcess in another VirtualHost when the ServerName directive
is the same.

<VirtualHost *:80>
    # Some stuff
    WSGIDaemonProcess app1 user=http group=http processes=10 threads=20
    WSGIScriptAlias / /path/to/django.wsgi process-group=app1
application-group=%{GLOBAL}
    # Some more stuff
</VirtualHost>

<VirtualHost *:443>
    # Some stuff
    WSGIScriptAlias / /path/to/django.wsgi process-group=app1
application-group=%{GLOBAL}
    # Some more stuff
</VirtualHost>

Note that I have changed it to use process-group and application-group
options of WSGIScriptAlias. This makes the configuration slimmer and
defining both options with WSGIScriptAlias has side effect of pre
importing script the same as WSGIImportScript did.

I also set application group to %{GLOBAL} so it forces use of the main
interpreter in the daemon process group processes as using the main
interpreter means a little less memory, since main interpreter is
created all the time anyway, even when a sub interpreter is used.
Plus, some extension modules for Python only work in the main
interpreter, so a good safe guard.

So that should eliminate memory for 2 process, plus save a bit from
not using an additional sub interpreter and instead just using the
main one.

End result is that New Relic should only report 10 instances whereas
before it would have reported 12.

As to whether the number of processes/threads for daemon process group
is right or not, that is going to depend on your application.

For that, would need to know your New Relic account ID so could go look at it.

Specifically I would be looking at the capacity analysis report
(requires version 1.5.0.1.103 of New Relic Python agent) as well as
creating some custom dashboards to chart some extra metrics about
threads in use.

For details of the capacity analysis report for the New Relic Python agent read:

http://blog.newrelic.com/2012/09/11/introducing-capacity-analysis-for-python/

For further information also suggest you watch:

http://lanyrd.com/2012/pycon/spcdg/
http://lanyrd.com/2012/pycon-au/swkdq/

Anyway, let me know the New Relic account ID in private email and I
can have a look and then comment further here if don't mind me using
your case as a tuning example.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] Django, python, mod_wsgi and Apache worker

Reply via email to