Hi, I don't think I've commented on this list before so let me briefly introduce myself. I'm Mike Malone. I live in San Francisco. I'm a developer (primarily web dev) and have some experience working with large clustered databases. I worked for Pownce.com, but moved to Six Apart when they acquired Pownce in November 2008.
I like the idea of a tree-structure since it's simple to understand and implement, but I think there may be cases where having multiple top-level proxies may make sense. As Damien pointed out, the top-level proxies will need to re-reduce / merge the documents from each partition, which may become a bottleneck. Damien pointed out how a tree structure would help to mitigate this problem by moving some of the work to sub-nodes. But couldn't you also add additional top-level proxies (with clients randomly choosing one to communicate with) to increase capacity without requiring a tree structure. This would also remove the top-level proxy as a single point of failure for the system. Mike On Fri, Feb 20, 2009 at 2:55 PM, Chris Anderson <[email protected]> wrote: > On Fri, Feb 20, 2009 at 2:45 PM, Damien Katz <[email protected]> wrote: > > > > On Feb 20, 2009, at 4:37 PM, Stefan Karpinski wrote: > > > >>> > >>> Trees would be overkill except for with very large clusters. > >>> > >> > >>> With CouchDB map views, you need to combine results from every node in > a > >>> big merge sort. If you combine all results at a single node, the single > >>> clients ability to simultaneously pull data and sort data from all > other > >>> nodes may become the bottleneck. So to parallelize, you have multiple > >>> nodes > >>> doing a merge sort of sub nodes , then sending those results to another > >>> node > >>> to be combined further, etc. The same with with the reduce views, but > >>> instead of a merge sort it's just rereducing results. The natural > "shape" > >>> of > >>> that computation is a tree, with only the final root node at the top > >>> being > >>> the bottleneck, but now it has to maintain connections and merge the > sort > >>> values from far fewer nodes. > >>> > >>> -Damien > >> > >> > >> That makes sense and it clarifies one of my questions about this topic. > Is > >> the goal of partitioned clustering to increase performance for very > large > >> data sets, or to increase reliability? It would seem from this answere > >> that > >> the goal is to increase query performance by distributing the query > >> processing, and not to increase reliability. > > > > > > I see partitioning and clustering as 2 different things. Partitioning is > > data partitioning, spreading the data out across nodes, no node having > the > > complete database. Clustering is nodes having the same, or nearly the > same > > data (they might be behind on replicating changes, but otherwise they > have > > the same data). > > > > Partitioning would primarily increase write performance (updates > happening > > concurrently on many nodes) and the size of the data set. Partitioning > helps > > with client read scalability, but only for document reads, not views > > queries. Partitioning alone could reduce reliability, depending how > tolerant > > you are to missing portions of the database. > > > > Clustering would primarily address database reliability (failover), > address > > client read scalability for docs and views. Clustering doesn't help much > > with write performance because even if you spread out the update load, > the > > replication as the cluster syncs up means every node gets the update > anyway. > > It might be useful to deal with update spikes, where you get a bunch of > > updates at once and can wait for the replication delay to get everyone > > synced back up. > > > > For really big, really reliable database, I'd have clusters of > partitions, > > where the database is partitioned N ways, each each partition have at > least > > M identical cluster members. Increase N for larger databases and update > > load, M for higher availability and read load. > > > > Thanks for the clarification. > > Can you say anything about how you see rebalancing working? > > > > -- > Chris Anderson > http://jchris.mfdz.com >
