On Sun, Sep 14, 2008 at 5:44 AM, Hendrijk Templehov <[EMAIL PROTECTED]> wrote: > So, to come to the point: The second map/reduce job (actually counting > the words) is fully done by application logic. After the > word_count/count-view is executed to CouchDB, CouchDB itself is not > anymore related to what you're doing there. If you imagine a task > where more than one (ore more than ten) map/reduce-jobs are involved, > only the first one is executed via CouchDB itself. This way you lose > CouchDB's distributed features, because you simply rely on your own > application. >
Hendrijk, You're correct that CouchDB does not currently support chained map-reduce jobs. This is because the incremental update feature (where only changes to the database have to be taken into account between queries to the view) doesn't have a facility to expire view-rows that are attached to original documents only through another map/reduce job. I've had success copying the output of a map/reduce view into another database, and then running another set of views on it. There has been some talk about how to do that while preserving the incremental update features, but I haven't heard of an implementation yet. As far as my examples go, it is possible to request from CouchDB a list of all the words in the books, and a count of each word (across all books or from each individually) through use of the group_level query parameter. What is *not* supported currently is outputting the top N words by count. Your application will have to download the unique list of words with their counts, and sort by count outside of CouchDB. Group_level examples are available in the CouchDB unit tests (see reduce): http://svn.apache.org/repos/asf/incubator/couchdb/trunk/share/www/script/couch_tests.js There is also some example code from me that uses group_level: http://jchris.mfdz.com/code/2008/6/markov_chains_using_couchdb_s_g -- Chris Anderson http://jchris.mfdz.com
