Re: Final Multi-DB status Update

Alex Gaynor Fri, 21 Aug 2009 19:37:12 -0700

Jon thanks for taking the time to give us some of your thoughts on the
API.  Hearing how people feel about APIs as they are developed is
always a huge boon.


On Wed, Aug 19, 2009 at 5:54 PM, JL<[email protected]> wrote:
>
> Hi Alex,
>
> Thanks so much for the effort you've put into this.  We've begun using
> your code relatively extensively at my work.  For what it's worth,
> we're an enterprise Java shop that offers a software as a service
> product to over 400 customers that are some of the biggest retailers
> and manufacturers in the world.  Because of multi-db we're now able to
> use Django in a useful way to do prototyping, proof of concept work
> and internal tools.  Hopefully in the future, we can move more toward
> Django as well.
>
> Last week over email, you asked me to share my impressions of working
> with the APIs with the larger developer audience.  The rest of this
> email outlines my impressions of working with the API but first a
> brief introduction to how we structure our data to provide some
> context to the others:  Basically we've split our data across multiple
> large DB clusters each with the same schema.  Our data is then
> "sharded" (I use the term loosely) by customer.  Essentially, you
> select the database cluster to work on by Customer then perform almost
> all of your operations against the selected cluster.
>
> Likes of the Multi-db API:
>
> - Definition of the databases in the settings file.  Love the way it's
> handled in a Python dictionary.  Clean elegant and then I can label
> each db wtih the actual name of the cluster I'm hitting.  I can define
> "default" as my local application db which contains my customer-
>>cluster mapping and then the actual clusters themselves
>
> - 'using' meta setting for models to set the default database.  While
> this makes the models less portable and apps a little less re-usable,
> it works for my use case.  I can leave this setting off for models
> that hit my local application db.  I've created a second application
> that is essentially just a container for models that attach to our
> clusters.  In this application I set the default 'using' to be our
> first cluster.  Ultimately though, I question the usefulness of this
> setting as it will be hard to use for external applications you may
> have imported into your projects without changing their code base.
> This will make upgrading any open source projects your using a little
> more difficult
>
> - 'using' queryset function.  It's elegant and reads well.  I do wish
> there was something more stateful that I could use (see later)
>
> Dislikes of multi-db API:
>
> - Following foreign key relationships can be tricky and require care
> both in the forward and reverse cases.  For example, in the forward
> case, imagine I have a Customer and I know that customer exists on DB
> Cluster 5 (c5 from here on out).  I also have a Product and a Category
> model where the Product model has a foreign key field pointed at the
> Category model and another foreign key field pointed at Customer.  I
> can query for a product for a given customer doing something like
> this:
>
> prd = Product.objects.using('c5').get(customer_ref = mycustomer,
> slug=my_product_slug)
>
> Now I might want to go and retrieve the Category that the product
> belongs to... maybe to do some output or to find the top n-products or
> something to output in a template.  My initial gut would tell me to do
> something like this:
>
> top_products_in_category = prd.category_ref.get_top_products()
>
> But that would be wrong.  Why?  Because when I try to look up
> prd.category_ref, Django checks against the default cluster instead of
> c5 where this customer and all it's data lives and the ORM will
> through a DoesNotExist exception.  Of course, I can look up my
> category and find my top products like this:
>
> top_products_in_category = Category.objects.using('c5').get
> (pk=prd.category_ref_id).get_top_products()
>
> but this is clunky and counterintuitive to the way relationship fields
> are supposed to work.  Where I could see this getting really hairy is
> in templates.  For example, lets say I'm rendering a product page that
> I'd like to include other top products from the category on.  I'm
> kinda screwed here since I can't dereference the pointer back to the
> Category table properly.  (Let's not even get into how'd I'd write the
> 'get_top_products' function at all since that would also have to know
> about what cluster the current data set is living on).
>
> This is also a problem (though much less so) when following the
> reverse relations.  At least here, since I'm working with a queryset,
> I can apply a 'using' clause.  So imagining I had a category object
> and wanted to follow it to products, I could something like this:
>
> prds = my_category.product_ref_set.using('c5').all()
>
> Of course, trying to use this in a template still falls short since I
> can't pass a parameter to a function in a template.
>
> I want a Pony:
>
> A great way to fix the above would be to have objects and querysets
> 'remember' where they originated from and apply it forward to any
> requests to related objects or reverse relation lookups.  So if I did
> this query:
>
> prds = Products.objects.using('c5').all()
>
> Anytime I reference a product in the prds queryset, it knows it came
> from c5 so all related look ups aim at that DB.  Now when I do things
> like:
>
> cat = prds[0].category_ref
>
> Django should know that 'prds' was generated from looking at db c5 and
> it should attempt to look up relationships there first.  The reverse
> API should also be available somehow so I can override where a foriegn
> key lookup is done (much like I can already do on reverse look ups
> with an extra 'where' clause).
>

This idea sort of appeals to me, but I fear there are some edge cases
who's behavior I'd want to figure out first.  For example, let's say I
have a model with Meta.using = 'db1' and a 2nd model with Meta.using =
'db2'.  When I try to access a foreignkey relation from something on
Model1 to Model2, which DB should it use, the one it came from, or the
default for Model2?

> One more pony request: I should be able to spec a 'using' in any of
> the short cut functions too ... like get_object_or_404 (you may have
> already done this).
>

I'd prefer not to alter get_object_or_404 (or similar helpers),
however don't forget, in addition to Models get_object_or_404 can take
a QuerySet, so you can do

get_object_or_404(MyModel.objects.using('db2'), pk=pk)

And it'll do what you want :)

> Regardless of your current status with the branch, I fully intend to
> keep on using this code.  Luckily, we're all mysql based here so I
> don't have to worry about the custom back end stuff :)  Thanks again
> for your effort this summer Alex!
>

Great to hear!

> Jon.
>
> On Aug 18, 3:24 am, Alex Gaynor <[email protected]> wrote:
>> Hey all,
>>
>> It seems GSOC has finally come to a close and so I'm giving my final
>> status update as a part of GSOC (but I'm not going anywhere!).  When
>> we last left off I had just gotten Oracle support working, however
>> after reviewing with Russ we agreed that the solution was a good bit
>> too hacky, and the real root of the problem was that the Query class
>> has 2 functions, one to record information and build a Pythonic
>> representation of a Query, which is the same for all SQL backends, as
>> well as to actually generate SQL from this representation, which is
>> different in the case of Oracle and others.  Therefore the solution is
>> to actually split these up into separate classes, so we can swap out
>> SQL generators without needing to care about the data collector.  In
>> short that's what I've been working on.  Unfortunately this isn't done
>> at the time of writing (and the end of GSOC), however as I said the
>> code basically works now, it's just not in a form that would end up
>> back in Django.  But, as I said, I'm not going anywhere.  I'm going to
>> continue to work on this problem, and I'll continue to checkin with
>> django-developers as design decisions and complications come up.
>>
>> For now, thanks for all the useful ideas, constructive criticism, and
>> words of encouragement django-developers has provided as I've worked
>> on this.
>>
>> Alex
>>
>> --
>> "I disapprove of what you say, but I will defend to the death your
>> right to say it." -- Voltaire
>> "The people's good is the highest law." -- Cicero
>> "Code can always be simpler than you think, but never as simple as you
>> want" -- Me
> >
>

If anyone has any thoughts on the above "corner case" of that possible
API, or on any part of the idea of remembering where on object comes
from (this would also be true for saving and deleting I think), I'd
love to hear them.

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Final Multi-DB status Update

Reply via email to