Re: CouchDB Partitioning Proposal

Randall Leeds Wed, 02 Jun 2010 01:11:23 -0700

On Tue, Jun 1, 2010 at 07:43, Rob Stewart <[email protected]> wrote:
> Hi, CouchDB devs.


Hi, Rob. Welcome!

>
> So, my first, naive, questions might be something like:
> 1. Is the proposal mentioned in the CouchDB wiki page still a valid problem
> (database partitioning).

Yes. That was an easy answer! :)

> 2. If so,  are the plans underway to solve this problem? I notice that there
> was a proposal for the Google Summer of Code in 2009 to provide a solution:
> http://socghop.appspot.com/document/show/user/rleeds/couchdb_cluster .

That's me! Since the time I wrote that proposal I've gone to work for
Meebo as one of the developers of the Lounge project, the canonical
source of which is my repository on github[1]. The Lounge deviates
from my GSoC proposal and the one outlined on the wiki, though. As in
the wiki proposal, the Lounge uses a tree-like structure of CouchDB
databases created through a proxy layer that handles the hashing and
distribution of keys. However, unlike both proposals the proxies work
on the HTTP layer and do not communicate via Erlang message passing.
This solution incurs the cost of extra JSON overhead in exchange for
keeping the software itself relatively simple and completely separate
from CouchDB itself.

In addition to the Lounge, Cloudant[2] is offering clustered CouchDB
hosting using in-house modifications to the CouchDB code. I cannot
speak authoritatively on their work so I won't try to compare it to
the Lounge other than to say that I believe it is written in Erlang.
For this reason it's possible pieces of their system could wind up in
CouchDB some day if they decide to license it for inclusion.

I've had a few discussions with Benoît Chesneau about implementing an
Erlang solution, but as I recall it mostly revolved around what
architectural changes we'd want to see to the internal APIs to make
the addition of partitioning as clean as possible. Little to no code
has been produced to this end on our part, though Paul Davis has done
a little bit of hacking[3] toward separating the HTTP layer more
cleanly while replacing MochiWeb with Basho's webmachine[4].

Finally, I've toyed around with the idea of re-implementing the Lounge
using Node.js[5] and Robert Newson has recently started to hack on it
as well. There is some (mostly useless so far) code on github[6].

> 3. If this is still an open problem for the CouchDB dev team, how would one
> get involved in the design of a partitioning architecture for CouchDB ?

Since there has been no consensus on the best way to go forward there
is clearly room for different approaches and several projects to
fulfilled different requirements. For my part, I help maintain the
Lounge for the day-to-day operations at Meebo. However, I would like
to see a project that tackles CouchDB clustering with a peer-to-peer
structure instead of a fixed tree, eliminating the operational
headache of manually distributing a fixed number of shards and taking
some lessons from Dynamo, Cassandra and Riak. My work on Lode is
mostly stalled while I hack on a structured overlay project for
Node.js, though I haven't released any source.

To get involved, keep the conversation going here or come to #couchdb
on freenode. Everyone I mentioned tends to frequent that channel. My
nick is the same as my github account: tilgovi.

I think that's a good overview of the state of CouchDB partitioning
solutions. Bring on the questions and discussion!

Kind regards,
Randall

[1] http://github.com/tilgovi/couchdb-lounge
[2] https://cloudant.com/
[3] http://github.com/davisp/couchdb/tree/webmachine
[4] http://webmachine.basho.com/
[5] http://nodejs.org
[6] http://github.com/tilgovi/lode

Re: CouchDB Partitioning Proposal

Reply via email to