Re: NoSQL Support for the ORM

Waldemar Kornewald Wed, 07 Apr 2010 13:43:46 -0700

On Wed, Apr 7, 2010 at 5:12 PM, Alex Gaynor <alex.gay...@gmail.com> wrote:
> No.  I am vehemently opposed to attempting to extensively emulate the
> features of a relational database in a non-relational one.  People
> talk about the "object relational" impedance mismatch, much less the
> "object-relational non-relational" one.  I have no interest in
> attempting to support any attempts at emulating features that just
> don't exist on the databases they're being emulated on.


This decision has to be based on the actual needs of NoSQL developers.
Did you actually work on non-trivial projects that needed
denormalization and in-memory JOINs and manually maintained counters?
I'm not making this up. The "dumb" key-value store API is not enough.
People are manually writing lots of code for features that could be
handled by an SQL emulation layer. Do we agree until here?

Then, the question boils down to: Is the ORM the right place to handle
those features?

We see more advantages in moving those features into the ORM instead
of some separate API:
No matter whether you do denormalization or an in-memory JOIN, you end
up emulating an SQL-like JOIN. When you're maintaining a counter you
again do a simple and very common operation supported by SQL:
counting. Django's ORM already provides that functionality. Django's
current reusable apps already use that functionality. Developers
already know Django's ORM and thus also that functionality. By moving
these features into the ORM
* existing Django apps will either work directly on NoSQL or at least
be much easier to port
* Django apps written for NoSQL will be portable across all NoSQL DBs
without any code changes and in the worst case require only minor
changes to switch to SQL
* the resulting code is shorter and easier to understand than with a
separate API which would only add another layer of indirection you'd
have to think about *every* (!) single time you work with models (and
if you have to think about this while writing model code you end up
with potentially a lot more bugs, as is actually the case in practice)
* developers won't have to use and learn a different models API (you'd
only need to learn an API for specifying "optimization" rules, but the
models would still be the same)

App Engine's indexes are not that different from what we propose. Like
many other NoSQL DBs, the datastore doesn't create indexes for all
possible queries. Sometimes you'll need a composite index to make
certain queries work. On Cassandra, CouchDB, Redis, and many other
"crippled" NoSQL DBs you solve this problem by maintaining even the
most trivial DB indexes with manually written indexing *code* (and I
mean *anything* that filters on fields other than the primary key). I
bet five years ago database developers would've called anyone nuts
who'd seriously suggest that nonsense, but somehow the NoSQL hype
makes developers forget about productivity. Anyway, on App Engine,
instead of writing code for those trivial indexes you add a simple
index definition to your index.yaml (actually, it's automatically
generated for you based on the queries you execute) and suddenly the
normal query API supports the respective filter rules transparently
(with exactly the same API; this is in strong contrast to Cassandra,
etc. which also make you manually write code for traversing those
manually implemented indexes! basically, they make you implement a
little specialized DB for every project and this is no joke, but the
sad truth). Now, our goal is to bring App Engine's indexing
definitions to the next level and allow to specify denormalization and
other "advanced" indexing rules which make more complicated queries
work transparently, again via the same API that everyone already
knows.

Instead of seeing this as object-relational non-relational mapping you
should see this as an object-relational mapping for a type of DB that
needs explicitly specified indexing rules for complex queries (which,
if you really think about it, exactly describes what working with
NoSQL DBs is like).

>> In addition to these changes you'll also need to take care of a few
>> other things:
>>
>> Many NoSQL DBs provide a simple "upsert"-like behavior where on save()
>> they either create a new entity if none exists with that primary key
>> or update the existing entity if one exists. However, on save() Django
>> first checks if an entity exists. This would be inefficient and
>> unnecessary, so the backend should be able to turn that behavior off.
>>
>> On delete() Django also deletes related objects. This can be a costly
>> operation, especially if you have a large number of entities. Also,
>> the queries that collect the related entities can conflict with
>> transaction support at least on App Engine and it might also be very
>> inefficient on HBase. IOW, it's not sufficient to let the user handle
>> the deletion for large datasets. So, non-relational (and maybe also
>> relatinoal) DBs should be able to defer and split up the deletion
>> process into background tasks - which would also simplify the
>> developer's job because he doesn't have to take care of manually
>> writing background tasks for large datasets, so it's a good addition
>> in general.
>>
>
> There is seperate work on another ticket to provide a way to declare
> ON_DELETE behavior, though this is a bit of a relational concept it
> seems to me making these easy to customize provides a good way for
> different backends to specify their behavior here.

Hmm, I'm not sure. The requirement is that this works transparently on
all DBs (without manually changing ForeignKeys). The proposed setting
ON_DELETE_HANDLED_BY_DB comes close, but it's still not the same
because we still need Django's code for collecting the related objects
(just at a later point and in groups of maybe 100 entities, so it can
be distributed across multiple background task runs).

>> I'm not sure how to handle multi-table inheritance. It could be done
>> with JOIN emulation, but this would be very inefficient.
>> Denormalization is IMHO not the answer to this problem, either. Should
>> Django simply fail to execute such a query on those backends or should
>> the user make sure that he doesn't use multi-table inheritance
>> unnecessarily in his code?
>>
>
> There's nothing about MTI that's inherently hard on a non-relational
> database, besides not being able to "select_related" the parent.

What if you filter on one field defined in the parent class and
another field defined on the child class? Emulating this query would
be either very inefficient and (for large datasets) possibly return no
results, at all, or require denormalization which I'd find funny in
the case of MTI because it brings us back to single-table inheritance,
but it might be the only solution that works efficiently on all NoSQL
DBs.

Bye,
Waldemar Kornewald

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: NoSQL Support for the ORM

Reply via email to