On Fri, Feb 20, 2009 at 7:34 PM, Chris Anderson <[email protected]> wrote: > I think so. I think that there could be proxy overlap / redundancy > across all levels of the tree, and also in the case of a flat tree. > > As long as the proxies agree on how to hash from URLs to nodes it > should just work.
I've been thinking about how to address the issue of allowing different configurations for different needs. I think if all we do is tell a proxy node who its children are, how to map IDs to those children, and allow a proxy to also be a node, we can handle almost any configuration. Examples: * All Peers - 2 nodes in the system, A & B. A is configured so odd IDs map to A, even IDs map to B. B is configured with the same ID ranges. You can load-balance across nodes A & B and take advantage of increased write throughput. This is probably the simplest clustering scenario for people that don't have enough traffic to fully utilize a standalone proxy node. * 1 or more proxies, multiple nodes - The proxies are all configured identically to map document IDs among nodes A-J. Nodes A-J know nothing about each other or their parents. In this scenario you can add very easily add proxy nodes as needed to handle the increased load when aggregating results from more nodes. * Tree structure - The top-level proxies are configured to map document IDs to nodes. These nodes may in fact be other proxies which are then configured to map to their nodes. Except for multiple levels of proxies, this is the same as the above scenario. Does it sound reasonable to expect a proxy to be aware of its children but not vice-versa? In an actual implementation I see the list of children and their mappings being stored in a document so that it could be updated while running to add/remove children. Adding a child in this scenario would involve choosing an ID range, replicating the relevant data from the other children, and updating this mapping. This would depend on partial replication to replicate only the data needed for the new child. I don't see this as something that's too complex - the only issue I see is you'll probably need to replicate data at least twice, once before the proxy mapping is updated and once after to get any final data that was written to the other children since the first replication. This also assumes you've chosen a consistent hashing algorithm so that the data on all nodes doesn't have to change when adding a single new node. Removing a child node would be the opposite process. I could foresee us coming up with a tool to automate most if not all of this process, possibly only requiring the user to start the new CouchDB server, fill in some values in Futon for ID mappings, and press a button. Sound reasonable? - Ben
