Re: [GSOC] Multiple Database API proposal

Malcolm Tredinnick Fri, 20 Mar 2009 21:07:05 -0700

Trimming unused portions of the response to make it readable (which I
should have done the first time around, too)...

On Fri, 2009-03-20 at 23:41 -0400, Alex Gaynor wrote:
> 
> 
> On Fri, Mar 20, 2009 at 11:21 PM, Malcolm Tredinnick
> <malc...@pointy-stick.com> wrote:
>         
>         
>         On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote:
>         > Hello all,

[...]

>         > The greatest hurdle is changing the connection after we
>         already have
>         > our
>         > ``Query`` partly created.  The issues here are that: we
>         might have
>         > done tests
>         > against ``connection.features`` already, we might need to
>         switch
>         > either to or
>         > from a custom ``Query`` object, amongst other issues.

[...]

>         >  One possible solution
>         > that is very powerful(though quite inellegant) is to have
>         the
>         > ``QuerySet`` keep
>         > track of all public API method calls against it and what
>         parameters
>         > they took,
>         > then when the ``connection`` is changed it will recreate the
>         ``Query``
>         > object
>         > by creating a "blank" one with the new connection and
>         reapplying all
>         > the
>         > methods it has stored.  This is basically a simple
>         implementation of
>         > the
>         > command pattern.
>         
>         
>         
>         
>         It's pretty yukky. There's a lot of Python level junk that we
>         intentionally avoid storing in querysets so that they behave
>         properly as
>         persistent data structures (clones are independent copies) and
>         can be
>         pickled without trouble, etc. It would be really bad for
>         performance to
>         reintroduce those (I did a lot of profiling when developing
>         that stuff
>         and tried to throw out as much as possible). I think this
>         fortunately
>         isn't going to be a real issue. I was pretty careful
>         originally to keep
>         the leakage from django.db.connection into the Query class to
>         as few
>         places as possible and mostly when we're creating the SQL.
>         
>         Some cases that might eb unavoidable could be replaced with
>         delayed
>         evaluation objects (essentially encapsulating the command
>         pattern just
>         for that fragment), which is a bit cleaner.
>         
> 
> One suggestion Eric Florenzano had was that we go above and beyond
> just storing the methods and parameters, we don't even excecute them
> at all until absolutely necessary.  

Excuse me for a moment whilst I add Eric to a special list I've been
keeping. He's trying to make trouble.

Ok, back now... There are at least two problems with this.

(a) Backwards incompatible in that some querysets would return
noticeably different results before and after that change. It would be
subtle, quiet and very difficult to detect without auditing every line
of code that contributes to a queryset. The worst kind of change for us
to make from the perspective of the users.

(b) Intentionally not done right now and not because I'm whimsical and
arbitrary (although I am). The problem is it requires storing all sorts
of arbitrarily complex Python objects. Which breaks pickling, which
breaks caching. People tend to complain, a lot, about that last bit.

That's why the Where.add() converts things to more basic types when they
are added (via a filter() command).  If somebody really needs lazily
evaluated parameters, it's easy enough via a custom Q-like object, but
so far nobody has asked for that if they've gotten stuck doing it. It's
even something we could consider adding to Django, although it's not a
no-brainer given the potential to break caching.

[...]
> 
> Thanks for all the review Malcolm.

No problems.

> One question that I didn't really ask in the initial post is what
> parameters should a "DatabaseManager" receieve on it's methods, one
> suggestion is the Query object, since that gives the use the maximal
> amount of information,, however my concerns there are that it's not a
> public API, and having a private API as a part of the public API feels
> klunky.

At first glance, I believe the word you're looking for is "wrong". :-)

Definitely a valid concern.

>   OTOH there isn't really another data structure that carries around
> the information someone writing their sharding logic(or whatever other
> scheme they want to implement) who inevitably want to have.

Two solutions spring to mind, although I haven't thought this through a
lot: it's not particularly germane to the proposal since it's something
we can work out a bit later on. I've got limited time today(something
about a beta release coming up), so I wanted to just get out responses
to the two people who posted items for discussion. I suspect there's a
lot of thinking needed here about the concept as a whole and I want to
do that. Anyway...

One option is to use the piece of public API that is available which
will always be carrying around a Query object: the QuerySet. Query
objects don't exist in isolation. However, this sounds problematic
because the implementation is going to be working at a very low-level --
database managers are only really interesting to Query.as_sql() and it's
dependencies. But that leads to the next idea, ...

The other is to work out a better place for this database manager in the
hierarchy. It might be something that lives as an attribute on a
QuerySet. Something like the user provides a function that picks the
database based "some information" that is available to it and the base
method selects the right database to use. Since it lives in the QuerySet
namespace, it can happily access the "query" attribute there without any
encapsulation violations. The database manager then becomes two pieces,
an algorithm on QuerySet (that might just dispatch to the real algorithm
on Query), plus some user-supplied code to make the right selections.
That latter thing could be a callable object if you need the full class
structure. But the stuff QuerySet/Query needs to know about is probably
a much smaller interface than *requiring* a full class. (Did any of that
make sense?)

I think this -- the database manager concept -- is the part of your
proposal that is most up in the air with respect to what the API looks
like. Which is fine. The fact that it's something to consider is good
enough to know. Certainly put some thought into the problem, but don't
sweat the details too much just yet (in the application period). This is
one of those hard areas where you probably do need to think about it so
much it costs you sleep, you forget to eat and so on.

Regards,
Malcolm

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: [GSOC] Multiple Database API proposal

Reply via email to