Re: What persists in memory across requests?

Ken J Fri, 18 May 2012 15:21:08 -0700

Thanks for the suggestions.

I have been using pgbouncer for awhile, and while it helps with the
issue of setup/tear down of postgres processes, it still leaves the
overhead of a socket setup/tear down between the django app and the
pgbouncer process.  Because almost every django request I have uses
the DB, I don't mind "wasting" a DB connection on the few that don't
touch the DB.


I dug into the django 1.3 code today and found that
BaseDatabaseWrapper is as you say, storing everything in thread local
storage including the db connection.  I also answered my own question
of how modules are cached in python.  Apparently, once a module has
been imported in any way, the module remains loaded, and can be
accessed through sys.modules by any thread if you so choose.  Any
globals defined in any loaded module remain in memory.

At the end of the day, I don't mind if there is simply a 1-to-1
mapping between django threads (spread over multiple processes even),
and DB connections.  I don't wish or requiring pooling strictly, I was
only looking into using a pool to work around the issue I have found
of DB connections being leaked if you disable the db connection close
after every request.  Ultimately, if I could find the source of this
leak, I would be a happy camper.

I know it sounds like I'm screwing around with a case of "premature
optimization" and all, but I've done a good amount of load testing
against our service, with the number of concurrent connections we
expect to see, and persistent DB connections do seem to make a
dramatic difference.  Over the course of tens going on hundreds of
thousands of requests, it can save mins worth of processing time.

On May 18, 1:52 pm, akaariai <akaar...@gmail.com> wrote:
> On May 18, 7:54 pm, Ken J <k...@filewave.com> wrote:
>
>
>
>
>
>
>
>
>
> > I'm quite curious as to what persists in memory across requests in
> > terms of django application variables.  I'm currently running a django
> > app in mod_wsgi daemon mode.
>
> > Because of performance concerns when dealing with large numbers of
> > concurrent requests, I wanted to modify django to keep persistent DB
> > connections to Postgres using a connection pool.
>
> > This in turn got me wondering, how can I persist a thread pool, or
> > even a simple DB connection, across requests?  I realize anything that
> > is global to the wsgi entry point script will persist.  The current
> > wsgi entry point I'm using is something like:
>
> > import django.core.handlers.wsgi
>
> > _application = django.core.handlers.wsgi.WSGIHandler()
>
> > def application(environ, start_response):
> >     return _application(environ, start_response)
>
> > Obviously _application will remain, but since code modules are
> > dynamically loaded based on URL resolvers, would the view/model/db
> > connection not be destroyed once the variables referencing said
> > objects go out of scope?
>
> > From logging statements, it has become apparent I can in fact make DB
> > connections persistent simply by not closing the DB connection after
> > the request has finished.  Unfortunately, I also found this to slowly
> > leak socket connections to the DB eventually making it so that I can't
> > log into the DB, hence why I was looking into a connection pool.
>
> > Anyways, I was hoping someone could shed some light as to the
> > internals of python/django on why django/db/__init__.py is able to
> > reference persistent connections.
>
> > My best guess is that because
>
> > connections = ConnectionHandler(settings.DATABASES)
>
> > is at the top level of a module, it remains held within the python
> > interpreter after being imported, thus holding a reference.
>
> > Any insight would be greatly appreciated.
>
> First, you should look into external connection poolers. PgBouncer is
> excellent if you need just a connection pool. Pg-pool II is another
> option, but it provides way more than just a connection pool.
>
> If you would like to implement a connection pool in Python code, it
> isn't the easiest thing to do. If you are using Django 1.4, then the
> connections object will be thread local - that is, it provides a
> different connection for each thread. Note that if you are using
> multiple processes (instead of threads) then the processes share
> nothing between each other. This is the reason you should look for
> external poolers - they can see all connection attempts
> simultaneously, but Python poolers are restricted to seeing one
> processes at a time.
>
> Still, if you want to experiment on poolers in Python code, I have
> written a couple of attempts for a connection pool you can use as
> bases of your work.https://github.com/akaariai/django-psycopg-pooled
> andhttps://github.com/akaariai/django_pooled.
>
> Short answer from my experiments: you do not want to implement
> connection pooling in Python, at least if performance is the reason
> for pooling. You could however do some other funny things, like
> rewriting queries to use prepared statements.
>
>  - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: What persists in memory across requests?

Reply via email to