Re: Partitioned Clusters

Ben Browning Sat, 21 Feb 2009 05:30:10 -0800

On Fri, Feb 20, 2009 at 7:34 PM, Chris Anderson <[email protected]> wrote:
> I think so. I think that there could be proxy overlap / redundancy
> across all levels of the tree, and also in the case of a flat tree.
>
> As long as the proxies agree on how to hash from URLs to nodes it
> should just work.


I've been thinking about how to address the issue of allowing
different configurations for different needs. I think if all we do is
tell a proxy node who its children are, how to map IDs to those
children, and allow a proxy to also be a node, we can handle almost
any configuration.

Examples:
* All Peers - 2 nodes in the system, A & B. A is configured so odd IDs
map to A, even IDs map to B. B is configured with the same ID ranges.
You can load-balance across nodes A & B and take advantage of
increased write throughput. This is probably the simplest clustering
scenario for people that don't have enough traffic to fully utilize a
standalone proxy node.

* 1 or more proxies, multiple nodes - The proxies are all configured
identically to map document IDs among nodes A-J. Nodes A-J know
nothing about each other or their parents. In this scenario you can
add very easily add proxy nodes as needed to handle the increased load
when aggregating results from more nodes.

* Tree structure - The top-level proxies are configured to map
document IDs to nodes. These nodes may in fact be other proxies which
are then configured to map to their nodes. Except for multiple levels
of proxies, this is the same as the above scenario.

Does it sound reasonable to expect a proxy to be aware of its children
but not vice-versa? In an actual implementation I see the list of
children and their mappings being stored in a document so that it
could be updated while running to add/remove children.

Adding a child in this scenario would involve choosing an ID range,
replicating the relevant data from the other children, and updating
this mapping. This would depend on partial replication to replicate
only the data needed for the new child. I don't see this as something
that's too complex - the only issue I see is you'll probably need to
replicate data at least twice, once before the proxy mapping is
updated and once after to get any final data that was written to the
other children since the first replication. This also assumes you've
chosen a consistent hashing algorithm so that the data on all nodes
doesn't have to change when adding a single new node.

Removing a child node would be the opposite process. I could foresee
us coming up with a tool to automate most if not all of this process,
possibly only requiring the user to start the new CouchDB server, fill
in some values in Futon for ID mappings, and press a button.


Sound reasonable?

- Ben

Re: Partitioned Clusters

Reply via email to