On Feb 20, 2009, at 1:55 PM, Stefan Karpinski wrote:

Hi, I thought I'd introduce myself since I'm new here on the couchdb
list. I'm Stefan Karpinski. I've worked in the Monitoring Group at
Akamai, Operations R&D at Citrix Online, and I'm nearly done with a
PhD in computer networking at the moment. So I guess I've thought
about this kind of stuff a bit ;-)

I'm curious what the motivation behind a tree topology is. Not that
it's not a viable approach, just why that and not a load-balancer in
front of a bunch of "leaves" with lateral propagation between the
leaves? Why should the load-balancing/proxying/caching node even be
running couchdb?

One reason I can see for a tree topology would be the hierarchical
cache effect. But that would likely only make sense in certain
circumstances. Being able to configure the topology to meet various
needs, rather than enforcing one particular topology makes more sense
to me overall.

Trees would be overkill except for with very large clusters.

With CouchDB map views, you need to combine results from every node in a big merge sort. If you combine all results at a single node, the single clients ability to simultaneously pull data and sort data from all other nodes may become the bottleneck. So to parallelize, you have multiple nodes doing a merge sort of sub nodes , then sending those results to another node to be combined further, etc. The same with with the reduce views, but instead of a merge sort it's just rereducing results. The natural "shape" of that computation is a tree, with only the final root node at the top being the bottleneck, but now it has to maintain connections and merge the sort values from far fewer nodes.

-Damien

Reply via email to