On Sun, Jan 24, 2010 at 2:04 PM, Glenn Rempe <[email protected]> wrote:
> On Sun, Jan 24, 2010 at 12:09 AM, Chris Anderson <[email protected]> wrote:
>
>> Devs,
>>
>> I've been thinking there are a few simple options that would magnify
>> the power of the replicator a lot.
>>
>> ...
>> The fun one is chained map reduce. It occurred to me the other night
>> that simplest way to present a chainable map reduce abstraction to
>> users is through the replicator. The action "copy these view rows to a
>> new db" is a natural fit for the replicator. I imagine this would be
>> super useful to people doing big messy data munging, and it wouldn't
>> be too hard for the replicator to handle.
>>
>>
> I like this idea as well, as chainable map/reduce has been something I think
> a lot of people would like to use.  The thing I am concerned about, and
> which is related to another ongoing thread, is the size of views on disk and
> the slowness of generating them.  I fear that we would end up ballooning
> views on disk to a size that is unmanageable if we chained them.  I have an
> app in production with 50m rows, whose DB has grown to >100GB, and the views
> take up approx 800GB (!). I don't think I could afford the disk space to
> even consider using this especially when you consider that in order to
> compact a DB or view you need roughly 2x the disk space of the files on
> disk.
>
> I also worry about the time to generate chained views, when the time needed
> for generating views currently is already a major weak point of CouchDB
> (Generating my views took more than a week).
>
> In practice, I think only those with relatively small DB's would be able to
> take advantage of this feature.
>

For large data, you'll want a cluster. The same holds true for other
Map Reduce frameworks like Hadoop or Google's stuff.

I'd be interested if anyone with partitioned CouchDB query experience
(Lounger or otherwise) can comment on view generation time when
parallelized across multiple machines.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Reply via email to