Paul's advice is right on - if you can get the data using a range query on a map view (without reduce), you should do that - if you need aggregation of very many rows into a short value, reduce is your friend.
On Wed, Aug 20, 2008 at 1:32 PM, Nicholas Retallack <[EMAIL PROTECTED]> wrote: > Replacing 'return values' with 'return values.length' shows you're > right. 4 minutes for the first query, miliseconds afterward, as > opposed to forever. > That sounds like the query times I'm getting. > > Are there plans to make reduce work for these more general > data-mangling tasks? Or should I be approaching the problem a > different way? Perhaps write my map calls differently so they produce > more rows for reduce to compact? Or do something special if the third > parameter to reduce is true? > "Plans" would be a strong term, but I've been digging through the source lately thinking about ways to make a more Hadoop-like map process. I've prototyped remap in Ruby http://github.com/jchris/couchrest/tree/master/utils/remap.rb The driving use case is a list of URLs, as output from a view, that are each fetched by the view server (robots.txt etc etc), with the fetched results stored as new documents. Essentially a Nutch implementation backed by CouchDB. Of course this could be an application process running against the HTTP API, but CouchDB's view-server plugin architecture could make managing data even easier than Hadoop does. I've got my crazy idea hat on, so don't expect to see this in trunk soon. ;) Chris -- Chris Anderson http://jchris.mfdz.com
