Re: Multi-db branch feedback/questions

[EMAIL PROTECTED] Tue, 11 Jul 2006 12:50:50 -0700

Replying to myself... here's what I've come up with to explain the
problems I see in my current implementation and what I think should be
done to fix them. Apologies in advance -- it's quite long.


I've implemented a bit of this just to make sure it would work, mainly
the basic parts in django.db.__init__.py. Still need to figure out how
to come up with some tests for these issues that won't be slow as
boiled slugs to run.

Anyway, here's the writeup:

Django can run as a WSGI app within a container like Paste, where the
same app may be run with different settings at the same time in
multiple simultaneous threads, and mutiple serial requests within the
same thread.

Below are listed a few changes that I think are required to make the
multi-db branch's settings and connection handling safe for serving in
that kind of enviroment.

For the purposes of this discussion, assume that the app myproj.myapp
is configured within the container with two sets of settings, SA and
SB. Imagine that we have a server running threads T1 and T2, and
answering requests R1, R2, ... etc.


Proposed change: Clear django.db.connections on request finish,
                 make django.db.connections thread-local.

Where: django.db.__init__.py

Rationale: Settings may change between requests, including what
DATABASE_* settings are labeled with a given key in OTHER_DATABASES.
django.db.connections is a LazyConnectionManager, so resetting its
_connections property between requests will clear named connections for
the next request. Of course, R1 may finish on T1 while R2 is running on
T2, so we only want to clear the django.db.connections._connections in
T1 -- so, django.db.connections._connections must become a thread-local
value.


Proposed change: Make django.db.connection, .backend, etc proxies

Where: django.db.__init__.py

Rationale: Apply the same isolation to the default connection and its
backend, etc (which are all module-level variables in django.db) as to
named connections in django.db.connections. Unfortunately in this case
we don't have a convenient dict to clear out at the end of each
request. And users may have 'from django.db import connection' anywhere
in their code. So django.db.connection, django.db.backend, etc must be
thread-local and also cleared at the end of each request, but without
rebinding their names.

All of these variables are references to attributes in
django.db.connection_info, which is a ConnectionInfo instance
intialized from the default settings (settings.DATABASE_*). The
shortest route to making them request- and thread-safe is to make them
instead attributes of a new DefaultConnectionInfoProxy instance, in
which each attribute delegates to the same attribute in a
ConnectionInfo stored in a thread-local container and reset at the end
of each request.


Proposed change: Move connection information into Manager.db

Where: django.db.models.manager

Rationale: Moving the per-model connection information from model._meta
to model._default_manager has already been discussed and I think is
generally approved -- this is just to flesh out a bit more what that
will mean.

Since the connection for a model will now be wholly specified by
settings, the model's default manager will have to look up the
connection to use for the model in settings.OTHER_DATABASES by
examining the MODELS entry in each value in that dict for the best
match to the model. A manager instance may be shared across threads, so
the storage for the current connection must be thread-local. And the
same manager may encounter different settings on different requests in
the same thread, if (say) R1 in T1 loads SA but R2 also in T1 loads SB,
so not only the ConnectionInfo but the whole connection scenario (named
or default, name of connection if named, settings to use) must be reset
at the end of each request.

To do this, I propose making manager.db a descriptor that serves as a
thread-local ConnectionInfo proxy. The proxy can be initialized during
contribute_to_class and attached only to the instance (not the Manager
class).

In use, where the current multi-db code has an access pattern like:

    connection = model._meta.connection

The new pattern would be:

    connection = model._default_manager.db.connection

Or in the manager itself:

    connection = self.db.connection
    backend = self.db.backend

and so on.

The model._default_manager.db descriptor will (on first access)
initialize and return the correct ConnectionInfo for the current
settings state that applies to the manager's model, caching that
ConnectionInfo in thread-local storage for the rest of the request and
connecting to the request_finished signal to clear the cache. This way,
only Managers used during a request would reset at the end of the
request.


Proposed change: Add a thread-local cache of connections

Where: django.db.__init__.py

Rationale: Efficiency. Settings may change between requests, but the
actual database connection picked out by the tuple (DATABASE_ENGINE,
DATABASE_NAME, DATABASE_USER, DATABASE_PASSWORD, DATABASE_HOST,
DATABASE_PORT) will not. However, we don't want to share a single
ConnectionInfo instance among multiple threads. Therefore if we add a
thread-local cache keyed by the tuple of DATABASE_* settings, we can
instaniate each needed ConnectionInfo only once per thread.

This is an optimization that can wait until the rest of the code is
working. It should be fairly easy to implement in
ConnectionInfo.__new__.

Thoughts?

JP


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Re: Multi-db branch feedback/questions

Reply via email to