By the time I opened the issue ticket I had become convinced that the DB 
Compiler was effectively an impossible route. I completely agree with your 
sentiments about implementing Compiler. I'd go as far as to suggest that 
few small documentation changes may be warranted in order to suitably 
explain to future developers that they should not take this route if their 
database is not "relational enough".

Using different models has some advantages in that it can take full 
advantage of the underlying database's capabilities. But it does sacrifice 
compatibility with a significant amount of existing Django packages, so 
putting aside the additional complexity level, its not the target I'm 
aiming for. 

I'm definitely coming to the conclusion that Queryset is the correct place 
to start work. 
I think there are a number of issues this will expose/create, such as 
issues related to UUID usage especially as Primary Keys, and from just a 
few minutes re-reading the queryset class I also think there may be a need 
to clarify  

So far, I've found the following problems related to UUIDs that might get 
in the way of 'finishing' this work.

Existing issues: 
- https://code.djangoproject.com/ticket/24691 
- https://code.djangoproject.com/ticket/24954
- https://code.djangoproject.com/ticket/6663

I've identified one new "issue".
There is an implicit assumption that primary keys are useful for ordering 
by the current QuerySet API methods `.first()` and `.last()`.
https://docs.djangoproject.com/en/1.9/ref/models/querysets/#first / 
https://docs.djangoproject.com/en/1.9/ref/models/querysets/#last 
I'll raise an issue for this item after I give an opportunity for further 
discussion here since I'd like to have more of an idea regarding typical 
usage of these two queryset methods. I'm currently unsure how often these 
are used on unordered QuerySet objects. If the current behaviour of 
implicitly falling back to ordering by the primary key is in heavy use, I 
will need to take that into consideration. In the shorter term I currently 
have a few possible workarounds in mind to replicate the existing behaviour 
but the performance implications of these different methods become 
significantly more important if the implicit order by primary key behaviour 
is heavily used. Longer term, this behaviour might be good to deprecate by 
documenting that without an integer primary key, this behaviour cannot be 
relied upon, and removing any workarounds that emulate integer ordering 
type behaviour.

Ticket 6663 was closed quite some time ago, however in order to get the 
most from any attempt to support non relational databases, via QuerySet or 
otherwise, it will need to be revisited and either reopened or a new issue 
created to address the point I'm about to make that I feel is encompassed 
by 6663. I hope I can avoid any confusion and be clear what I feel is 
covered by this.

The current UUIDField that was recently added to Django is not always 
suitable for use as a database primary key because:
- The UUIDField generates the UUID with Python code and this is less than 
optimal in some circumstances. Many databases can or do generate document 
or row UUID 'primary keys' automatically. It should be possible to let 
Django defer the creation of the UUID and rely on the database for the 
creation of UUID primary keys just like it currently does for automatically 
incrementing integer primary keys. 
- Existing Django applications/libraries were not written with UUID primary 
keys. Supporting existing Django applications and models is one of my 
goals, so requiring explicit use of something like `id = 
UUIDField(primary_key=True)` on a model in order to make it compatible, 
represents an issue to me. 

Ticket 6663 was about the ability to use a UUID as the primary key. While 
on the surface this appears solved, we can do `id = 
UUIDField(primary_key=True)` and we have a UUID as the primary key, what 
hasn't been addressed is the ability to say "I want to use UUIDs for 
primary keys", I feel this was the intent behind Ticket 6663 and it should 
be reopened with an explicit focus on fixing the following two things:
- The default AutoField that Django provides any model that doesn't 
explicitly create its own id field, should not "force" the use of an 
automatically incrementing integer based primary key.
- A mechanism for configuring what kind of primary keys should be used. The 
two most likely configurations are all integer primary keys and all UUID 
primary keys, so my initial thoughts are that this mechanism should reside 
at the public QuerySet API layer, probably as a boolean value set during 
QuerySet class `__init__`.

In addition to this, in order for this to be most effective, there needs to 
be a way to specify that you want to use an alternative QuerySet class. 
There are several places one could override this for their own application 
and models very easily, however no convenient way to modify the 'default 
QuerySet' class provided by `Manager`. While my first thought is modify 
`BaseManager.from_queryset()` here 
https://github.com/django/django/blob/stable/1.9.x/django/db/models/manager.py#L143
 
so that the definition of `Manager` no longer has to explicitly pass 
QuerySet, like it does here 
https://github.com/django/django/blob/stable/1.9.x/django/db/models/manager.py#L238
 
the potential impact of such changes is definitely something I'm unfamiliar 
with, so I would greatly appreciate any feedback on how appropriate this 
approach would be.

- Sam 


On Wednesday, 16 December 2015 15:33:08 UTC+8, Anssi Kääriäinen wrote:

> On Tuesday, December 15, 2015 at 5:43:55 PM UTC+2, Samuel Bishop wrote:
>>
>> Having worked through the code of several Django nosql/alternative 
>> database backend libraries, forks, etc... 
>>
>> I've noticed that that one of the biggest challenges they run into, is 
>> 'conforming' to many of the things Django expects these lowest layers to do.
>>
>> I opened this ticket https://code.djangoproject.com/ticket/25265 to 
>> begin getting feedback on an initial idea for how to 'fix' the problem.
>> Since then I've had further time to ponder the problem and while it still 
>> seems to me that the best mechanism is to draw a line between the 'upper' 
>> and 'lower' layers of Django, 
>> I'm no longer 100% sure the correct place to enable this is the queryset 
>> via an additional method, because I've realized that this is not just an 
>> opportunity to get NoSQL databases into Django, but also an opportunity to 
>> finally provide support for alternative Python ORMs, such as SQLAlchemy. 
>>
>> I've been digging around the code for this so I dont mind writing up the 
>> code for this, but there is the big question of 'where to decouple' things. 
>> Initial feedback in the thread 
>> https://code.djangoproject.com/ticket/25265#comment:4 
>> <https://www.google.com/url?q=https%3A%2F%2Fcode.djangoproject.com%2Fticket%2F25265%23comment%3A4&sa=D&sntz=1&usg=AFQjCNEeTJV5U_vgQBRIQIppH6F1Hf991Q>
>>  
>> has raised the suggestion that moving one layer further up may be the right 
>> place to go. It would be very helpful for me to get extra input from Django 
>> developers familiar with the QuerySet and Query, before I start writing, so 
>> I would love to hear feedback on the idea.
>>
>
> Assume the goal is perfect admin integration with a MongoDB backend. The 
> approach can be either:
> 1) Use Django's standard models, create a QuerySet compatible 
> MongoDBQuerySet.
> 2) Use completely different models, which respond to the APIs needed by 
> Admin. This includes implementing a QuerySet compatible MongoDBQuerySet.
>
> There is a lot more work to 2), but the benefit is that you get to use 
> models actually meant to be used with a non-relational backend. For 
> example, Django's User, Permission and Group models are implemented in a 
> way that makes sense for a relational backend. If you use relational schema 
> on non-relational database you are going to face big problems if you try to 
> run the site with any non-trivial amount of data. For this reason I believe 
> 2) to be the right approach.
>
> But, to get there, a QuerySet compatible MongoDBQuerySet is needed 
> anyways. Here the choices are those mentioned in 
> https://code.djangoproject.com/ticket/25265#comment:4 
> <https://www.google.com/url?q=https%3A%2F%2Fcode.djangoproject.com%2Fticket%2F25265%23comment%3A4&sa=D&sntz=1&usg=AFQjCNEeTJV5U_vgQBRIQIppH6F1Hf991Q>.
>  
> That is, you can go with Django's QuerySet and Query, and just implement a 
> MongoDBCompiler. Or, you can use QuerySet with MongoDBQuery class. Or, 
> finally, you can implement MongoDBQuerySet directly from scratch.
>
> If you implement Compiler or Query, you are targeting internal APIs which 
> we *will* change in the future, maybe even in dramatic ways. If you target 
> QuerySet, you are targeting a public API that doesn't change often. And, 
> even if it changes, you will get a nice deprecation period for the changes.
>
> It might seem a lot of work to start from QuerySet, but for a 
> non-relational backend there isn't actually *that* much work involved. Most 
> of Django's Query and Compiler classes deal with joins, SQL's NULL 
> peculiarities or SQL's way of doing aggregations. All of these are 
> non-issues for non-relational backends.
>
> So, I think you should start with implementing a custom QuerySet for your 
> wanted backend. You can also try to make it work with all Django models, 
> but that approach is very likely to fail. For starters, Django's models use 
> an autoincrementing integer field for primary key, whereas most (if not 
> all) nonrelational databases use something different. Another interesting 
> case is ManyToManyFields, which assumes a relational data model.
>
> It is very tempting to go with an approach where you just implement a 
> custom Compiler class for your nonrelational backend. This would, in 
> theory, allow users to run any existing Django application on 
> non-relational database by just using a non-relational backend. The problem 
> with this approach is that it doesn't work well enough in practice, and the 
> maintenance overhead in the long run is huge.
>
>  - Anssi
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/589de0c1-55f2-4c01-868b-ae5e353186bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to