Re: [modwsgi] Processes and memory use

Graham Dumpleton Thu, 07 Jan 2010 15:09:50 -0800

2010/1/6 Ian Bicking <[email protected]>:
> I'm using mod_wsgi with toppcloud
> (http://bitbucket.org/ianb/toppcloud)... it's kind of server
> configuration and tool to manage sites on that server.  The primary
> goal is easy deployment and management of Python applications, with
> minimal fuss.  It uses mod_wsgi.
>
> So... right now I've configured it to have 5 processes and no threads
> for each application, with no process sharing between applications.
> Right now each application has to live on its own domain, but that's
> just temporary -- in the future a site should be possibly formed out
> of several applications (/ -a CMS, /blog -a blog, /gallery -some
> gallery app, etc).  But that means 5 processes for each of these
> areas, and the memory use gets a bit high: memory-per-app(~20Mb) *
> number-of-apps * 5


It only means five process for each if you explicitly delegated each
application to a separate daemon mode process group.

That said, even if you do delegate them to the same daemon mode
process group, the default for mod_wsgi is to separate each mounted
WSGI application, ie., distinct SERVER_HOST/SCRIPT_NAME, into its own
persistent sub interpreter within each process. This is done by
default because many WSGI application instances don't like coexisting
with other WSGI applications. For example, you cannot run multiple
Django instances in same Python interpreter instance due to their use
of global data such as DJANGO_SETTINGS_MODULE environment variable.

If you are running WSGI applications which can safely coexist with
other WSGI applications using the same framework, you can however
override this default however. The first way is to specify:

  WSGIApplicationGroup %{SERVER}

This replaces the inbuilt default which of:

  WSGIApplicationGroup %{RESOURCE}

As explained above, that default of %{RESOURCE} means each WSGI
application (resource) is given its own group. The use of %{SERVER}
means however that one sub interpreter is used for all WSGI
applications running under a specific server. Where server means
virtual host server name in conjunction with listener port, albeit
with exception that 80/443 are treated specially and effectively
merged and run together in same Python sub interpreter.

Note these are still Python sub interpreters within a process and not
the first or main Python interpreter. This is important to know as any
Python C extension modules that use simplified Python GIL state API
will only work in main Python interpreter. If you think you might be
using such modules, better to explicitly force WSGI applications to
run in the first or main Python interpreter. This can be done by
using:

  WSGIApplicationGroup %{GLOBAL}

Since there is only one first or main Python interpreter, by naming it
you can also delegate applications running under different servers to
run together. So long as you aren't using Python C extension modules
that must run in main Python interpreter, you can actually take
complete control of what runs together by explicitly naming the sub
interpreter to use. Thus you might have:

  WSGIApplicationGroup all-pylons-applications

and:

  WSGIApplicationGroup django-application-1
  WSGIApplicationGroup django-application-2

Obviously, you need to stick these in appropriate configuration
context. This could be inside of Location directive related to URL
namespace or Directory directive where referring to location of WSGI
script file.

Alternatively, in mod_wsgi 3.X+ you can specify the process group and
application group with the WSGIScriptAlias directive. If both are
specified at the same time, a side affect will be that the WSGI
application will be preloaded on process startup.

You thus might have an example like the following if using just directives.

  WSGIDaemonProcess general user=www-data processes=5 threads=1 \
  maximum-requests=200 inactivity-timeout=3600 display-name=wsgi \
  home=/var/www

  # Delegate all WSGI applications to run in general process group.

  WSGIProcessGroup general

  # Default to putting each WSGI application in its own sub interpreter. This is
  # the default, but say it explicitly to make it obvious.

  WSGIApplicationGroup %{RESOURCE}

  # Mount some WSGI applications at different URLs.

  WSGIScriptAlias /blog /some/path/blog/django.wsgi
  WSGIScriptAlias /gallery /some/path/gallery/django.wsgi
  WSGIScriptAlias /wiki /some/path/wiki/pylons.wsgi
  WSGIScriptAlias /webmail /some/path/webmail/pylons.wsgi
  WSGIScriptAlias / /some/path/cms/pylons.wsgi

  # Delegate where WSGI applications will run.

  <Location /blog>
  WSGIApplicationGroup django-application-1
  </Location>

  <Location /gallery>
  WSGIApplicationGroup django-application-2
  </Location>

  <Location /wiki>
  WSGIApplicationGroup all-pylons-applications
  </Location>

  <Location /webmail>
  WSGIApplicationGroup %{GLOBAL}
  </Location>

  <Location />
  WSGIApplicationGroup all-pylons-applications
  </Location>

Or if using options to WSGIScriptAlias instead, get rid of
WSGIApplicationGroup from Location directives and use instead:

  # Mount some WSGI applications at different URLs.

  WSGIScriptAlias /blog /some/path/blog/django.wsgi
application-group=django-application-1
  WSGIScriptAlias /gallery /some/path/gallery/django.wsgi
application-group=django-application-2
  WSGIScriptAlias /wiki /some/path/wiki/pylons.wsgi
application-group=all-pylons-applications
  WSGIScriptAlias /webmail /some/path/webmail/pylons.wsgi
application-group=%{GLOBAL}
  WSGIScriptAlias / /some/path/cms/pylons.wsgi
application-group=all-pylons-applications

> So I'm hoping for advice, or maybe this will turn into a feature
> request.

I say the amount of configurability to control which processes and
which sub interpreters WSGI applications can run in covers pretty well
all that one would want to do.

One can even get even more magic than above by having process group or
application group (sub interpreter) be dynamically chosen at run time
for a request. This is done by sourcing name of group from an Apache
variable. Eg:

  WSGIApplicationGroup %{ENV:APPLICATION_GROUP}

The value of that variable can then be dynamically set from
mod_rewrite rules or mod_headers rules.

That one can put WSGIProcessGroup/WSGIApplicationGroup in any Location
URL context, it is even possible to split a single application across
multiple sub interpreters or processes.

For example, if certain URLs within an application had large transient
memory usage requirements but didn't occur very often, you could
delegate just those URLs to be handled in its own daemon process group
which recycled processes on much shorted interval. For example:

  WSGIDaemonProcess general user=www-data processes=5 threads=1 \
  maximum-requests=200 inactivity-timeout=3600 display-name=wsgi \
  home=/var/www

  WSGIDaemonProcess memory-hungry user=www-data processes=5 threads=1 \
  maximum-requests=50 inactivity-timeout=60 display-name=wsgi \
  home=/var/www

  WSGIScriptAlias / /some/path/cms/pylons.wsgi

  # Everything goes to general process by default.

  WSGIProcessGroup general

  # Send just certain URLs to memory hungry process group.

  <Location /some/sub/url>
  WSGIProcessGroup memory-hungry
  </Location>

> The current configuration:
>
> WSGIDaemonProcess general user=www-data processes=5 threads=1
> maximum-requests=200 inactivity-timeout=3600 display-name=wsgi
> home=/var/www
>
> One possibility of course is to use processes=1 and threads=5, or
> something like that (the 15 thread default seems really high).

The default of 15 is to give a safe buffer where people have long
running requests.

In mod_wsgi 3.X+ thread management is a bit smarter than 2.X and
although the underly C thread exists, it will never be activated in
Python world unless needed. That is, code will always try to use the
most recently used thread. Thus, even if 15 threads, if application
only every needs 5 to handle concurrent requests, the remainder will
stay dormant and additional memory resources for each thread will
never be created in Python space.

> But
> I'd like to avoid threading; not all frameworks work well with it, and
> I like the isolation and simplicity of a single process per request.
> Ideally I'd like maybe 1 process and 1 thread per app, but for new
> processes to be created as needed.  It seems like inactivity-timeout
> could accomplish this, but I'm unclear on its purpose or mechanism.

At the moment inactivity-timeout is only for resetting a process which
hasn't been used in a while. Specifically, if you have an application
that might only be used once a day for a while, no point it using
memory all day.

It is intended to have there be dynamic process creation, but not
necessarily varying numbers within a group as that gets much harder to
do and is in part why fastcgi implementations suck in some respect and
don't create processes properly or don't destroy them afterwards. I
have documented some of the intentions on this in:

  http://blog.dscpl.com.au/2009/03/future-roadmap-for-modwsgi.html

Not sure when I will get to any of that future stuff now.

> Right now 5 processes are started right away.  Will there be less than
> 5 processes after an hour of inactivity?

There will not be less processes, but providing preloading of WSGI
applications not being done, the processes will be empty and in
mod_wsgi 3.X should only take up a few hundred kilo bytes of memory.
Thus, minimum memory possible with process just waiting to be
activated again.

>  It doesn't seem like it...
> does it just kill and respawn processes after one hour?  If so I'm not
> sure of the point.  So, if inactivity-timeout worked like my intuition
> would imply it should work (kill idle processes, restart on demand)
> that'd be great.

Sorry, only in future when can get to it, and then not perhaps exactly
as you have in mind. :-)

> What would *really* be ideal is if there were, say, 10 processes
> total, and those 10 processes were allocated among all possible
> consumers according to load.  Depending on the size of the server,
> there's usually a top limit above which you get declining performance;
> so even in high-load situations it'd be better to have 10 quick
> processes handling requests than 30 slow processes.
>
> If there's other suggestions about how to manage memory, I'd love to
> hear them... I just don't want to trade reliability for resources.

Have a look through what have described so far and we can then discuss
more as needs be.

I still haven't got my proper Internet back after moving house though
and also off work so no work Internet either, so responses may be
intermittent at the moment.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] Processes and memory use

Reply via email to