By the time I opened the issue ticket I had become convinced that the DB Compiler was effectively an impossible route. I completely agree with your sentiments about implementing Compiler. I'd go as far as to suggest that few small documentation changes may be warranted in order to suitably explain to future developers that they should not take this route if their database is not "relational enough".
Using different models has some advantages in that it can take full advantage of the underlying database's capabilities. But it does sacrifice compatibility with a significant amount of existing Django packages, so putting aside the additional complexity level, its not the target I'm aiming for. I'm definitely coming to the conclusion that Queryset is the correct place to start work. I think there are a number of issues this will expose/create, such as issues related to UUID usage especially as Primary Keys, and from just a few minutes re-reading the queryset class I also think there may be a need to clarify So far, I've found the following problems related to UUIDs that might get in the way of 'finishing' this work. Existing issues: - https://code.djangoproject.com/ticket/24691 - https://code.djangoproject.com/ticket/24954 - https://code.djangoproject.com/ticket/6663 I've identified one new "issue". There is an implicit assumption that primary keys are useful for ordering by the current QuerySet API methods `.first()` and `.last()`. https://docs.djangoproject.com/en/1.9/ref/models/querysets/#first / https://docs.djangoproject.com/en/1.9/ref/models/querysets/#last I'll raise an issue for this item after I give an opportunity for further discussion here since I'd like to have more of an idea regarding typical usage of these two queryset methods. I'm currently unsure how often these are used on unordered QuerySet objects. If the current behaviour of implicitly falling back to ordering by the primary key is in heavy use, I will need to take that into consideration. In the shorter term I currently have a few possible workarounds in mind to replicate the existing behaviour but the performance implications of these different methods become significantly more important if the implicit order by primary key behaviour is heavily used. Longer term, this behaviour might be good to deprecate by documenting that without an integer primary key, this behaviour cannot be relied upon, and removing any workarounds that emulate integer ordering type behaviour. Ticket 6663 was closed quite some time ago, however in order to get the most from any attempt to support non relational databases, via QuerySet or otherwise, it will need to be revisited and either reopened or a new issue created to address the point I'm about to make that I feel is encompassed by 6663. I hope I can avoid any confusion and be clear what I feel is covered by this. The current UUIDField that was recently added to Django is not always suitable for use as a database primary key because: - The UUIDField generates the UUID with Python code and this is less than optimal in some circumstances. Many databases can or do generate document or row UUID 'primary keys' automatically. It should be possible to let Django defer the creation of the UUID and rely on the database for the creation of UUID primary keys just like it currently does for automatically incrementing integer primary keys. - Existing Django applications/libraries were not written with UUID primary keys. Supporting existing Django applications and models is one of my goals, so requiring explicit use of something like `id = UUIDField(primary_key=True)` on a model in order to make it compatible, represents an issue to me. Ticket 6663 was about the ability to use a UUID as the primary key. While on the surface this appears solved, we can do `id = UUIDField(primary_key=True)` and we have a UUID as the primary key, what hasn't been addressed is the ability to say "I want to use UUIDs for primary keys", I feel this was the intent behind Ticket 6663 and it should be reopened with an explicit focus on fixing the following two things: - The default AutoField that Django provides any model that doesn't explicitly create its own id field, should not "force" the use of an automatically incrementing integer based primary key. - A mechanism for configuring what kind of primary keys should be used. The two most likely configurations are all integer primary keys and all UUID primary keys, so my initial thoughts are that this mechanism should reside at the public QuerySet API layer, probably as a boolean value set during QuerySet class `__init__`. In addition to this, in order for this to be most effective, there needs to be a way to specify that you want to use an alternative QuerySet class. There are several places one could override this for their own application and models very easily, however no convenient way to modify the 'default QuerySet' class provided by `Manager`. While my first thought is modify `BaseManager.from_queryset()` here https://github.com/django/django/blob/stable/1.9.x/django/db/models/manager.py#L143 so that the definition of `Manager` no longer has to explicitly pass QuerySet, like it does here https://github.com/django/django/blob/stable/1.9.x/django/db/models/manager.py#L238 the potential impact of such changes is definitely something I'm unfamiliar with, so I would greatly appreciate any feedback on how appropriate this approach would be. - Sam On Wednesday, 16 December 2015 15:33:08 UTC+8, Anssi Kääriäinen wrote: > On Tuesday, December 15, 2015 at 5:43:55 PM UTC+2, Samuel Bishop wrote: >> >> Having worked through the code of several Django nosql/alternative >> database backend libraries, forks, etc... >> >> I've noticed that that one of the biggest challenges they run into, is >> 'conforming' to many of the things Django expects these lowest layers to do. >> >> I opened this ticket https://code.djangoproject.com/ticket/25265 to >> begin getting feedback on an initial idea for how to 'fix' the problem. >> Since then I've had further time to ponder the problem and while it still >> seems to me that the best mechanism is to draw a line between the 'upper' >> and 'lower' layers of Django, >> I'm no longer 100% sure the correct place to enable this is the queryset >> via an additional method, because I've realized that this is not just an >> opportunity to get NoSQL databases into Django, but also an opportunity to >> finally provide support for alternative Python ORMs, such as SQLAlchemy. >> >> I've been digging around the code for this so I dont mind writing up the >> code for this, but there is the big question of 'where to decouple' things. >> Initial feedback in the thread >> https://code.djangoproject.com/ticket/25265#comment:4 >> <https://www.google.com/url?q=https%3A%2F%2Fcode.djangoproject.com%2Fticket%2F25265%23comment%3A4&sa=D&sntz=1&usg=AFQjCNEeTJV5U_vgQBRIQIppH6F1Hf991Q> >> >> has raised the suggestion that moving one layer further up may be the right >> place to go. It would be very helpful for me to get extra input from Django >> developers familiar with the QuerySet and Query, before I start writing, so >> I would love to hear feedback on the idea. >> > > Assume the goal is perfect admin integration with a MongoDB backend. The > approach can be either: > 1) Use Django's standard models, create a QuerySet compatible > MongoDBQuerySet. > 2) Use completely different models, which respond to the APIs needed by > Admin. This includes implementing a QuerySet compatible MongoDBQuerySet. > > There is a lot more work to 2), but the benefit is that you get to use > models actually meant to be used with a non-relational backend. For > example, Django's User, Permission and Group models are implemented in a > way that makes sense for a relational backend. If you use relational schema > on non-relational database you are going to face big problems if you try to > run the site with any non-trivial amount of data. For this reason I believe > 2) to be the right approach. > > But, to get there, a QuerySet compatible MongoDBQuerySet is needed > anyways. Here the choices are those mentioned in > https://code.djangoproject.com/ticket/25265#comment:4 > <https://www.google.com/url?q=https%3A%2F%2Fcode.djangoproject.com%2Fticket%2F25265%23comment%3A4&sa=D&sntz=1&usg=AFQjCNEeTJV5U_vgQBRIQIppH6F1Hf991Q>. > > That is, you can go with Django's QuerySet and Query, and just implement a > MongoDBCompiler. Or, you can use QuerySet with MongoDBQuery class. Or, > finally, you can implement MongoDBQuerySet directly from scratch. > > If you implement Compiler or Query, you are targeting internal APIs which > we *will* change in the future, maybe even in dramatic ways. If you target > QuerySet, you are targeting a public API that doesn't change often. And, > even if it changes, you will get a nice deprecation period for the changes. > > It might seem a lot of work to start from QuerySet, but for a > non-relational backend there isn't actually *that* much work involved. Most > of Django's Query and Compiler classes deal with joins, SQL's NULL > peculiarities or SQL's way of doing aggregations. All of these are > non-issues for non-relational backends. > > So, I think you should start with implementing a custom QuerySet for your > wanted backend. You can also try to make it work with all Django models, > but that approach is very likely to fail. For starters, Django's models use > an autoincrementing integer field for primary key, whereas most (if not > all) nonrelational databases use something different. Another interesting > case is ManyToManyFields, which assumes a relational data model. > > It is very tempting to go with an approach where you just implement a > custom Compiler class for your nonrelational backend. This would, in > theory, allow users to run any existing Django application on > non-relational database by just using a non-relational backend. The problem > with this approach is that it doesn't work well enough in practice, and the > maintenance overhead in the long run is huge. > > - Anssi > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/589de0c1-55f2-4c01-868b-ae5e353186bb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.