Replying to myself... here's what I've come up with to explain the problems I see in my current implementation and what I think should be done to fix them. Apologies in advance -- it's quite long.
I've implemented a bit of this just to make sure it would work, mainly the basic parts in django.db.__init__.py. Still need to figure out how to come up with some tests for these issues that won't be slow as boiled slugs to run. Anyway, here's the writeup: Django can run as a WSGI app within a container like Paste, where the same app may be run with different settings at the same time in multiple simultaneous threads, and mutiple serial requests within the same thread. Below are listed a few changes that I think are required to make the multi-db branch's settings and connection handling safe for serving in that kind of enviroment. For the purposes of this discussion, assume that the app myproj.myapp is configured within the container with two sets of settings, SA and SB. Imagine that we have a server running threads T1 and T2, and answering requests R1, R2, ... etc. Proposed change: Clear django.db.connections on request finish, make django.db.connections thread-local. Where: django.db.__init__.py Rationale: Settings may change between requests, including what DATABASE_* settings are labeled with a given key in OTHER_DATABASES. django.db.connections is a LazyConnectionManager, so resetting its _connections property between requests will clear named connections for the next request. Of course, R1 may finish on T1 while R2 is running on T2, so we only want to clear the django.db.connections._connections in T1 -- so, django.db.connections._connections must become a thread-local value. Proposed change: Make django.db.connection, .backend, etc proxies Where: django.db.__init__.py Rationale: Apply the same isolation to the default connection and its backend, etc (which are all module-level variables in django.db) as to named connections in django.db.connections. Unfortunately in this case we don't have a convenient dict to clear out at the end of each request. And users may have 'from django.db import connection' anywhere in their code. So django.db.connection, django.db.backend, etc must be thread-local and also cleared at the end of each request, but without rebinding their names. All of these variables are references to attributes in django.db.connection_info, which is a ConnectionInfo instance intialized from the default settings (settings.DATABASE_*). The shortest route to making them request- and thread-safe is to make them instead attributes of a new DefaultConnectionInfoProxy instance, in which each attribute delegates to the same attribute in a ConnectionInfo stored in a thread-local container and reset at the end of each request. Proposed change: Move connection information into Manager.db Where: django.db.models.manager Rationale: Moving the per-model connection information from model._meta to model._default_manager has already been discussed and I think is generally approved -- this is just to flesh out a bit more what that will mean. Since the connection for a model will now be wholly specified by settings, the model's default manager will have to look up the connection to use for the model in settings.OTHER_DATABASES by examining the MODELS entry in each value in that dict for the best match to the model. A manager instance may be shared across threads, so the storage for the current connection must be thread-local. And the same manager may encounter different settings on different requests in the same thread, if (say) R1 in T1 loads SA but R2 also in T1 loads SB, so not only the ConnectionInfo but the whole connection scenario (named or default, name of connection if named, settings to use) must be reset at the end of each request. To do this, I propose making manager.db a descriptor that serves as a thread-local ConnectionInfo proxy. The proxy can be initialized during contribute_to_class and attached only to the instance (not the Manager class). In use, where the current multi-db code has an access pattern like: connection = model._meta.connection The new pattern would be: connection = model._default_manager.db.connection Or in the manager itself: connection = self.db.connection backend = self.db.backend and so on. The model._default_manager.db descriptor will (on first access) initialize and return the correct ConnectionInfo for the current settings state that applies to the manager's model, caching that ConnectionInfo in thread-local storage for the rest of the request and connecting to the request_finished signal to clear the cache. This way, only Managers used during a request would reset at the end of the request. Proposed change: Add a thread-local cache of connections Where: django.db.__init__.py Rationale: Efficiency. Settings may change between requests, but the actual database connection picked out by the tuple (DATABASE_ENGINE, DATABASE_NAME, DATABASE_USER, DATABASE_PASSWORD, DATABASE_HOST, DATABASE_PORT) will not. However, we don't want to share a single ConnectionInfo instance among multiple threads. Therefore if we add a thread-local cache keyed by the tuple of DATABASE_* settings, we can instaniate each needed ConnectionInfo only once per thread. This is an optimization that can wait until the rest of the code is working. It should be fairly easy to implement in ConnectionInfo.__new__. Thoughts? JP --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---