On Nov 3, 2008, at 14:40, Jonathan Ginter wrote:

From what I have read, it sounds like the project is not yet ready to
scale this large, but there are plans in place to do so (faster view
parsers, partitioning, etc). Is there a rough target for this work? We have a roadmap for upcoming projects and I need to know whether CouchDB
can be considered for the short term (i.e., within the next 4 - 6
months) or whether we will have to give it more time to incubate and
come back to it later on in the longer term.

No ETA. but feel free to sponsor development :) The two biggest boosts for view generation are (as you correctly identified) JSON serialisation on the Erlang-end and actually making use of MapReduce's parallel nature. At the moment, view creation is single-threaded and limited to a single core on your
system.

Just to avoid potential misunderstanding: Incubation is the process of
becoming an Apache project. It has nothing to do with the software
development roadmap.

Cheers
Jan
--




Jonathan

-----Original Message-----
From: Damien Katz [mailto:[EMAIL PROTECTED]
Sent: Monday, November 03, 2008 6:00 AM
To: couchdb-user@incubator.apache.org
Subject: Re: Largest CouchDB dbs?


On Nov 3, 2008, at 4:38 AM, Jan Lehnardt wrote:


On Nov 3, 2008, at 05:53, Jonathan Ginter wrote:

I have a similar issue.  I am interested in using CouchDB to host a
200+ GB database that will receive well over 200 million documents
per day.  Moreover, the data must roll out - i.e., constant
background purging - and also support UI queries.  And this is just
a starting point to match the abilities of the relational database
we are already running.  I will want the DB to scale up from there.

If there is no hope of the CouchDB being able to handle all of that
- regardless of how many machines we deploy - I would like to know
that now before I look any further into this project.

Does anyone have a reasonable idea about whether CouchDB will be
capable of such massive scalability or how many machines it would
take to scale that large?

This sounds like a scenario that CouchDB will ultimately be able to
handle nicely. I don't think we can give out any guarantees about when
an how this will be the case. Maintaining a 200+GB data set would
require
quite some hand-wiring at the moment.


I would appreciate any feedback that anyone might have on this.

I think Damien can chime in here :) Damien?


This is definitely well within what couchdb should be able to do once
partitioning is in place. I'm not really working on this yet, but
there are a lot of people and companies interested in seeing the
partitioning work done. So maybe some progress will be made soon.

-Damien


Reply via email to