Re: Partitioned Clusters

Damien Katz Fri, 20 Feb 2009 11:55:53 -0800


On Feb 20, 2009, at 1:55 PM, Stefan Karpinski wrote:

Hi, I thought I'd introduce myself since I'm new here on the couchdb
list. I'm Stefan Karpinski. I've worked in the Monitoring Group at
Akamai, Operations R&D at Citrix Online, and I'm nearly done with a
PhD in computer networking at the moment. So I guess I've thought
about this kind of stuff a bit ;-)

I'm curious what the motivation behind a tree topology is. Not that
it's not a viable approach, just why that and not a load-balancer in
front of a bunch of "leaves" with lateral propagation between the
leaves? Why should the load-balancing/proxying/caching node even be
running couchdb?

One reason I can see for a tree topology would be the hierarchical
cache effect. But that would likely only make sense in certain
circumstances. Being able to configure the topology to meet various
needs, rather than enforcing one particular topology makes more sense
to me overall.


Trees would be overkill except for with very large clusters.

With CouchDB map views, you need to combine results from every node ina big merge sort. If you combine all results at a single node, thesingle clients ability to simultaneously pull data and sort data fromall other nodes may become the bottleneck. So to parallelize, you havemultiple nodes doing a merge sort of sub nodes , then sending thoseresults to another node to be combined further, etc. The same withwith the reduce views, but instead of a merge sort it's justrereducing results. The natural "shape" of that computation is a tree,with only the final root node at the top being the bottleneck, but nowit has to maintain connections and merge the sort values from farfewer nodes.


-Damien

Re: Partitioned Clusters

Reply via email to