On Wed, Apr 7, 2010 at 4:43 AM, Waldemar Kornewald <wkornew...@gmail.com> wrote:
> Hey Alex,
>
> On Apr 7, 2:11 am, Alex Gaynor <alex.gay...@gmail.com> wrote:
>> Non-relational database support for the Django ORM
>> ==================================================
>>
>> Note:  I am withdrawing my proposal on template compilation.  Another
>> student
>> has expressed some interest in working on it, and in any event I am
>> now more
>> interested in working on this project.
>
> It's great that you want to work on this project. Since I want to see
> this feature in Django, I'm offering mentoring help with the NoSQL
> part. You know Django's ORM better than me, so I probably can't really
> help you there, but I can help to make sure that your modifications
> will work well on NoSQL DBs. Just in case this is necessary, I'll
> apply as a GSoC mentor before it's too late (if I remember correctly,
> in 2007 we could still allow new mentors even at this late stage)?
>
>> Method
>> ~~~~~~
>>
>> The ORM architecture currently has a ``QuerySet`` which is backend
>> agnostic, a
>> ``Query`` which is SQL specific, and a ``SQLCompiler`` which is
>> backend
>> specific (i.e. Oracle vs. MySQL vs. generic).  The plan is to change
>> ``Query``
>> to be backend agnostic by delaying the creation of structures that are
>> SQL
>> specific, specifically join/alias data.  Instead of structures like
>> ``self.where``, ``self.join_aliases``, or ``self.select`` all working
>> in terms
>> of joins and table aliases the composition of a query would be stored
>> in terms
>> of a tree containing the "raw" filters, as passed to the filter calls,
>> with
>> things like ``Field.get_prep_value`` called appropriately.  The
>> ``SQLCompiler``
>> will be responsible for computing the joins for all of these data-
>> structures.
>
> Could you please elaborate on the data structures? In the end, non-
> relational backends shouldn't have to reproduce large parts of the
> SQLQuery code just to emulate a JOIN. When we tried to do a similar
> refactoring we quickly faced the problem that we needed something
> similar to setup_joins() and other SQLQuery features. We'd also have
> to create code for grouping filters into individual queries on tables.
> The Query class should take care of as much of the common stuff as
> possible, so nonrel backends can potentially emulate every single SQL
> feature (e.g., via MapReduce or whatever) with the least effort.
> Otherwise this refactoring would actually have more disadvantages than
> our current SQLCompiler-based approach in Django-nonrel (as ridiculous
> as that sounds).
>
> However, it's important that all of the emulated features are handled
> not by the backend, but by a reusable code layer which sits on top of
> the nonrel backends. It would be wasteful to let every backend
> developer write his own JOIN emulation and denormalization and
> aggregate code, etc.. The refactored ORM should at least still allow
> for writing some kind of "proxy" backend that sits on top of the
> actual nonrel backend and takes care of SQL features emulation. I'm
> not sure if it's a good idea to integrate the emulation into Django
> itself because then progress will be slowed down.
>
> Ideally, we should provide a simplified API for nonrel backends,
> similar to the one that we recently published for Django-nonrel, so a
> backend could be written in two days instead of two weeks. We can port
> our work over to the refactored ORM, so this you don't have to deal
> with this (except if it should be officially integrated into Django).
>

No.  I am vehemently opposed to attempting to extensively emulate the
features of a relational database in a non-relational one.  People
talk about the "object relational" impedance mismatch, much less the
"object-relational non-relational" one.  I have no interest in
attempting to support any attempts at emulating features that just
don't exist on the databases they're being emulated on.

> In addition to these changes you'll also need to take care of a few
> other things:
>
> Many NoSQL DBs provide a simple "upsert"-like behavior where on save()
> they either create a new entity if none exists with that primary key
> or update the existing entity if one exists. However, on save() Django
> first checks if an entity exists. This would be inefficient and
> unnecessary, so the backend should be able to turn that behavior off.
>
> On delete() Django also deletes related objects. This can be a costly
> operation, especially if you have a large number of entities. Also,
> the queries that collect the related entities can conflict with
> transaction support at least on App Engine and it might also be very
> inefficient on HBase. IOW, it's not sufficient to let the user handle
> the deletion for large datasets. So, non-relational (and maybe also
> relatinoal) DBs should be able to defer and split up the deletion
> process into background tasks - which would also simplify the
> developer's job because he doesn't have to take care of manually
> writing background tasks for large datasets, so it's a good addition
> in general.
>

There is seperate work on another ticket to provide a way to declare
ON_DELETE behavior, though this is a bit of a relational concept it
seems to me making these easy to customize provides a good way for
different backends to specify their behavior here.

> I'm not sure how to handle multi-table inheritance. It could be done
> with JOIN emulation, but this would be very inefficient.
> Denormalization is IMHO not the answer to this problem, either. Should
> Django simply fail to execute such a query on those backends or should
> the user make sure that he doesn't use multi-table inheritance
> unnecessarily in his code?
>

There's nothing about MTI that's inherently hard on a non-relational
database, besides not being able to "select_related" the parent.

> Bye,
> Waldemar Kornewald
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To post to this group, send email to django-develop...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.
>
>

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to