Re: [modwsgi] Lazy Initilisation of Python (mod_wsgi, django)

Graham Dumpleton Wed, 19 Jun 2013 04:02:03 -0700

Brain dump mode on.

At this point I am a bit confused about what specifically you are having a 
problem with, but let me explain in detail how startup of processes and loading 
of applications occurs.

Firstly. when Apache starts up, in its parent process it will load the mod_wsgi 
module for Apache. This is linked to the Python runtime library and thus the 
Python library is also loaded.

Under mod_wsgi 3.X, the Python interpreter is not itself initialised in the 
Apache parent process. Prior to mod_wsgi 3.0 it was initialised in the Apache 
parent process, but due to how recent Python versions are implemented, 
initialising Python itself in the Apache parent process can result in memory 
growth in the Apache parent process when an Apache restart/reload is done. This 
is because Python doesn't clean up after itself properly when the interpreter 
is destroyed. In other words, it leaks memory.

The reason we even get into this situation with mod_wsgi 2.X is because on an 
Apache restart/reload, the mod_wsgi module is told to cleanup after itself. 
This in turn causes the Python interpreter to be destroyed. Apache then unloads 
the mod_wsgi module from the process as well as the Python library. After 
rereading the configuration and seeing that mod_wsgi is still needed, it will 
load mod_wsgi again and reinitialise the Python interpreter. The problem is 
that because the Python library was unloaded, when initialisation is done 
again, it is against freshly zeroed out memory. Thus the memory objects that 
Python leaves around and which it didn't delete properly, can't be reused when 
the interpreter is initialised again as would be the case if the interpreter 
was created/destroyed and created again in same process memory space.

Anyway, the point of describing this is to indicate that using mod_wsgi 2.X is 
probably not a good idea at this point and mod_wsgi 3.X should always be used. 
Unfortunately Ubuntu 10.04 LTS only provides mod_wsgi 2.8 and so many people 
still use it. The risk in using mod_wsgi 2.X is that if you do many 
restart/reloads of Apache, rather than a complete stop/start, then that parent 
process can grow.

Now, once the Apache parent process is setup, it will then for its child worker 
processes. These are the processes which then handle HTTP requests. The number 
of these is dictated by the MPM settings. Because these are forks of the 
parent, if that parent process does grow in size due to the above issues, then 
all the worker process will in turn be using more memory. Further reason why 
not to use mod_wsgi 2.X.

Although allowing Python to be initialised in the parent process is a bad idea 
now, the WSGILazyInitialization directive still exists. This defaults to On. 
When set to On, the Python interpreter will not be initialised in the parent. 
Thus the default behaviour in mod_wsgi 3.X. You could set it to Off and restore 
the mod_wsgi 2.X behaviour, but don't go there.

With the lazy initialisation of the Python interpreter, once those Apache child 
worker processes are forked, only then will the Python interpreter be 
initialised. In being initialised, the main Python interpreter context will be 
created. This is equivalent to having used the command line Python.

If you are using WSGIDaemonProcess directive and delegating your WSGI 
applications to run in those separate process groups, then no actual WSGI 
application will run in the Apache child worker processes. This means that you 
don't actually need to have the Python interpreter be initialised in the Apache 
child worker processes. Doing so will just waste CPU and slow down setup of the 
Apache child worker processes, delaying how long before they can start 
accepting HTTP requests.

Thus, if you are using WSGIDaemonProcess directive and always delegating WSGI 
applications to the daemon process groups, then set:

WSGIRestrictEmbedded On

This will prevent initialisation of the Python interpreter in the Apache child 
worker processes saving CPU and memory in those process.

I talk about this whole problem more in:

http://blog.dscpl.com.au/2009/11/save-on-memory-with-modwsgi-30.html

In setting this directive, you will also get an error occur if you managed to 
stuff up the configuration when using daemon process groups and hadn't actually 
delegated the WSGI application to run in the daemon process group you had 
created. This is actually really common because there are some stupid blog 
posts out there which are wrong.

In short, if you have:

WSGIDaemonProcess group-name

you must either be setting:

WSGIProcessGroup group-name

in an appropriate context to ensure the WSGI application when handling a 
request is done in the daemon process group.

If using mod_wsgi 3.X, you can also use the process-group option to 
WSGIScriptAlias. Thus:

WSGIScriptAlias / /some/path/wsgi.py process-group=group-name

If you don't have one of these being applied, then your WSGI application will 
still run in the Apache child worker processes, making the WSGIDaemonProcess 
directive pointless.

So, if only using daemon mode, make sure you set WSGIRestrictEmbedded and set 
it to On.

You can check whether a WSGI application is running in daemon mode by using 
test in:

http://code.google.com/p/modwsgi/wiki/CheckingYourInstallation#Embedded_Or_Daemon_Mode

You can also ensure you set:

LogLevel info

and mod_wsgi will log messages about when it is loading WSGI scripts and will 
tell you which process group it is loading them in. If you see an empty string 
for the process group, that is actually running in the Apache child worker 
process or what is referred to as embedded mode.

Now no matter whether running in embedded mode or daemon mode, by default, each 
mounted WSGI application, be they setup by WSGIScriptAlias or 
AddHandler/SetHandler will run in a separate interpreter within each process.

When Python is first initialised in a process, be that the Apache child worker 
processes of daemon mode processes, it will always create the main interpreter 
context, the one which is equivalent to what you get when running command line 
interpreter.

This main interpreter by default isn't actually used. Instead each WSGI 
application is run in a separate sub interpreter created for it within the same 
process. Although the main interpreter is created when the process is first 
forked and Python initialised, these sub interpreters are not. Instead these 
application specific sub interpreters are only created on demand when the first 
request comes in for a WSGI application.

This is done because by default it is not possible to know in advance when 
using AddHandler/SetHandler what the name of the sub interpreter context to 
create should be as one will not know what WSGI scripts map exist since the 
mapping to the WSGI script file in the file system is dynamic.

That WSGIProcessGroup exists and because there are also means of setting the 
process group dynamically through mod_rewrite rules also means that for 
WSGIScriptAlias you can't be sure what to create in advance.

End result is that the sub interpreter created for a specific WSGI application 
is only created on demand with the process handling the request the first time 
request is handled by that WSGI application. Thus the sub interpreter is lazily 
created and why you will see at 'info' logging level mod_wsgi saying the sub 
interpreter is only created on the first request.

If you are using daemon process groups and only one WSGI application runs for 
daemon process group, you can avoid this lazy creation of the sub interpreter 
by forcing the WSGI application to run in that main interpreter context created 
when Python was initialised in the process.

This is done using:

WSGIApplicationGroup %{GLOBAL}

or:

WSGIScriptAlias / /some/path/wsgi.py process-group=group-name 
application-group=%{GLOBAL}

Because it runs in the main interpreter is saves on the extra memory created 
for the sub interpreter. More importantly, it avoids problems with third party 
C extension modules for Python they aren't implemented correctly to work in sub 
interpreters.

So, if using daemon mode and delegating one WSGI application to a daemon 
process group, always set the application group (interpreter context) to 
%{GLOBAL}.

If you had multiple WSGI applications running in the same daemon process group, 
you can't necessarily force them to all run in the same main interpreter as 
some frameworks such as Django will not allow you to run multiple Django sites 
in the same interpreter. In that case you are better off delegating each Django 
instance to a separate daemon process group and then force use of the main 
interpreter.

If using embedded mode, seriously consider using daemon mode with one WSGI 
application to each daemon process group.

So that is how you can eliminate use of sub interpreters and the apparent lazy 
creation of them.

Next issue issue is the lazy loading of the WSGI script file itself and thus 
the lazy loading of your WSGI application on the first request. This is done 
lazily for same reasons as above, you just cannot know what WSGI applications 
may need to be loaded and into what process/interpreter until the request 
actually arrives.

There is also though a way of force preloading the WSGI script file.

If you are using mod_wsgi 3.X and you say:

WSGIScriptAlias / /some/path/wsgi.py process-group=group-name 
application-group=%{GLOBAL}

That is, you set both process-group and application-group options at the same 
time, it is saying in advance that that WSGI application will always run in 
that context no matter what other settings such as WSGIProcessGroup and 
WSGIApplicationGroup may say.

As a result, mod_wsgi will when it sees both options set on WSGIScriptAlias, 
will preload that WSGI script file when the process first starts rather than on 
the first request.

If you are using mod_wsgi 2.X, a bit more work has to be done as those options 
aren't accepted by WSGIScriptAlias on older mod_wsgi. In that case you need to 
use:

WSGIScriptAlias / /some/path/wsgi.py
WSGIProcessGroup group-name
WSGIApplicationGroup %{GLOBAL}

WSGIImportScript /some/path/wsgi.py process-group=group-name 
application-group=%{GLOBAL}

The WSGIImportScript is saying to preload that Python code file when the 
process start. The WSGIProcessGroup and WSGIApplicationGroup in the context the 
WSGIScriptAlias is applied then must match the same process and application 
group names given for WSGIImportImport. IOW matching static and dynamic and if 
they don't align, then you will end up loading the WSGI script file more than 
once in two distinct interpreter contexts and waste memory.

So, use mod_wsgi 3.X and the options to WSGIScriptAlias to avoid mistakes.

Finally although you can force preload the WSGI script file on process start 
rathe than first request, this doesn't mean your whole application will load. 
This is because some frameworks such as Django will only initialise themselves 
upon the first request occurring. Thus, to force the WSGI application to 
initialise and potentially load stuff, you would need to fake a web request 
against the WSGI application at the point the WSGI script file is being loaded. 
The easiest way to do this is to use WebTest.

For example, if you have:

import os
os.environ["DJANGO_SETTINGS_MODULE"] = "mysite.settings"

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

at the end of the WSGI script file you would add:

from webtest import TestApp
testapp_wrapper = TestApp(application)
testapp_wrapper.get('/')

Now because Django can actually also lazily load parts of the application at 
the time the specific URLs are hit, then you may want to make requests in this 
way against a few key URLs to get important parts of your application loaded.

Enough. Brain dump mode off. Hopefully I didn't make too many mistakes in that. 
I have been getting too many queries about related stuff in recent times, so 
about time I got all that out so I can just refer to it all in one place.

Graham

On 19/06/2013, at 7:17 PM, venu k <[email protected]> wrote:

> Hi Graham,
> 
> Thank your for responding to my tweets. Sorry , i should have done posting my 
> query here. 
> 
> 
> This is my project Structure. ( under /var/www/)
> 
> bingoproject
> |-- manage.py
> |-- bingo
> |   |-- __init__.py
> |   |-- __init__.pyc
> |   |-- settings.py
> |   |-- settings.pyc
> |   |-- urls.py
> |   |-- wsgi.py
> |   `-- wsgi.pyc
> `-- bingoapp
>     |-- __init__.py
>     |-- __init__.pyc
>     |-- migrations
>     |   |-- 0001_initial.py
>     |   |-- 0001_initial.pyc
>     |   |-- __init__.py
>     |   `-- __init__.pyc
>     |-- models.py
>     |-- models.pyc
>     |-- static
>     |   `-- admin
>     |       |-- css
> 
> this is httpd.conf:
> 
> ServerName bingo
> WSGIPythonHome /usr/local
> WSGILazyInitialization on
> #WSGIRestrictEmbedded on
> 
> This is bingo.conf file :
> 
> <VirtualHost *:80>
> ServerName www.bingo.com
> WSGIDaemonProcess venukulala-bingo user=venukulala group=venukulala 
> processes=2 threads=5 python-eggs=/tmp/python-eggs/ python-path=/$
> WSGIProcessGroup venukulala-bingo
> WSGIScriptAlias / /var/www/bingoproject/bingo/wsgi.py
> 
> <Directory /var/www/bingoproject/>
>         Order deny,allow
>         Allow from all
> </Directory>
> 
> Alias /static 
> /usr/local/lib/python2.7/dist-packages/django/contrib/admin/static
> <Directory/usr/local/lib/python2.7/dist-packages/django/contrib/admin/static >
>         Order allow,deny
>         Allow from all
>         SetHandler None
>         FileETag none
>         Options FollowSymLinks
> </Directory>
> 
> ErrorLog /var/log/apache2/bingo-error.log
> LogLevel info
> CustomLog /var/log/apache2/bingo-access.log combined
> </VirtualHost>
> 
> 
> i have tried changing the WSGIPythonHome to actual path in /usr/lib , also 
> creating a virtual env and then pointing to this path ...didnt work.
> 
> These are the errors i see in the error.log ...
> 
> 
> [Wed Jun 19 14:31:03 2013] [info] [client 10.74.152.157] mod_wsgi 
> (pid=29245): Connect after WSGI daemon process restart, attempt #1.
> [Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Stopping process 
> 'venukulala-bingo'.
> [Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Destroying 
> interpreters.
> [Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Destroy interpreter 
> 'www.bingo.com|'.
> [Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Cleanup interpreter ''.
> [Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Terminating Python.
> [Wed Jun 19 14:31:03 2013] [info] mod_wsgi (pid=7585): Create interpreter 
> 'www.bingo.com|'.
> [Wed Jun 19 14:31:03 2013] [info] [client 10.74.152.157] mod_wsgi (pid=7585, 
> process='venukulala-bingo', application='www.bingo.com|'): Loading WSGI 
> script '/var/www/bingoproject/bingo/wsgi.py'.
> [Wed Jun 19 04:01:03 2013] [info] mod_wsgi (pid=5253): Python has shutdown.
> [Wed Jun 19 14:31:03 2013] [info] mod_wsgi (pid=7699): Attach interpreter '
> 
> 
> Any help is greatly appreciated.
> 
> Thanks,
> Venu
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [modwsgi] Lazy Initilisation of Python (mod_wsgi, django)

Reply via email to