Re: CouchDB as MapReduce framework?

Chris Anderson Sun, 14 Sep 2008 07:56:55 -0700

On Sun, Sep 14, 2008 at 5:44 AM, Hendrijk Templehov
<[EMAIL PROTECTED]> wrote:
> So, to come to the point: The second map/reduce job (actually counting
> the words) is fully done by application logic. After the
> word_count/count-view is executed to CouchDB, CouchDB itself is not
> anymore related to what you're doing there. If you imagine a task
> where more than one (ore more than ten) map/reduce-jobs are involved,
> only the first one is executed via CouchDB itself. This way you lose
> CouchDB's distributed features, because you simply rely on your own
> application.
>


Hendrijk,

You're correct that CouchDB does not currently support chained
map-reduce jobs. This is because the incremental update feature (where
only changes to the database have to be taken into account between
queries to the view) doesn't have a facility to expire view-rows that
are attached to original documents only through another map/reduce
job.

I've had success copying the output of a map/reduce view into another
database, and then running another set of views on it. There has been
some talk about how to do that while preserving the incremental update
features, but I haven't heard of an implementation yet.

As far as my examples go, it is possible to request from CouchDB a
list of all the words in the books, and a count of each word (across
all books or from each individually) through use of the group_level
query parameter. What is *not* supported currently is outputting the
top N words by count. Your application will have to download the
unique list of words with their counts, and sort by count outside of
CouchDB.

Group_level examples are available in the CouchDB unit tests (see reduce):
http://svn.apache.org/repos/asf/incubator/couchdb/trunk/share/www/script/couch_tests.js

There is also some example code from me that uses group_level:
http://jchris.mfdz.com/code/2008/6/markov_chains_using_couchdb_s_g

-- 
Chris Anderson
http://jchris.mfdz.com

Re: CouchDB as MapReduce framework?

Reply via email to