> Recently I found out Django doesn't support multiple databases. That's
> quite surprising.
> 
> Given that limitation, how do you scale out a Django app?

Depends on where your bottleneck(s) is/are.  It also depends 
heavily on your read/write usage pattern.  If you're truly 
experiencing a stenosis of the database connection, you have 
several options, but most of them reside in domain specific tuning.

> Without multi-DB support, most of the usual techniques for scaling out
> such as:
>    - DB sharding
>    - functional partitioning - eg. separate DB servers for user
> profiles, orders, and products
> would be infeasible with django.

Sharding and functional partitioning don't yet exist in stock 
Django.  There's a GSoC project that may make some headway on 
"multiple database support", but I've not heard anything further 
on the Django Developers regarding that.

> I know replication is still available. But that still means all data
> must fit in 1 server.

Well, with bountiful storage using things like AoE, SAS, SAN, FC, 
etc, having "all the data fit in one server" isn't a horrible 
issue.  And with 1TB drives on the market, fitting multiple TB in 
a single machine isn't a disastrous idea.  If you have more data 
than will fit in a single machine, you have a lot of other issues 
and will likely have to get very specific (and likely expensive 
;-) help.

> Also replication isn't going to help update performance.

This goes back to my "read/write usage pattern" quip...if you 
have a high volume of reads, and a low volume of writes, 
replication is one of the first tools you reach for.  However, 
with a high volume of writes, you've entered the realm of "hard 
problems".  Usually if you app reaches this volume of DB traffic, 
you need a solution specialized to your domain, so stock Django 
may not be much help.  Given that you've not detailed the problem 
you're actually having (this is where profiling comes in), it's 
hard to point much beyond the generic here.  So answers to some 
questions might help:

- are you bringing back huge datasets or just small sub-slices of 
your data?

- are you updating large swaths of data at a time, or are you 
just updating single records most of the time?

- are just a few select users doing the updating, and all the 
rest of your users are doing piles of reads?

- how big is this hypothetical DB of yours?

- can you partition by things that totally do not relate, such as 
by customer, so each customer can have their own instance that 
then gets put wherever your admins define letting DNS balance the 
load? (a'la BaseCamp's customername.basecamp.com)

- can you tolerate replication delays?  what time-frame? 
(sub-second?  async taking up to 30 minutes?  a whole day?)

- how readily can you cache things to prevent touching the 
database to begin with?  Can you cache with an HTTP proxy 
font-end for repeated pages?  Can you cache datasets or other 
fragments with memcached?  If your web-app follows good design, 
any GET can be cached based on a subset of its headers.

Lastly, read over David Cramer's blog[1] as he's done some nice 
work scaling Django to big deployments and has some helpful tips.

-tim

[1]
http://www.davidcramer.net/category/code/django






--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to