On Fri, Jun 5, 2009 at 7:13 AM, Zachary Zolton <[email protected]> wrote: > So, Chris, it sounds like you're saying that POSTing to that URL will > place the entire results of querying the view with group=true into > another database. Sounds great! > > Will it work with 0.9? Would you suggest automating this using _changes? >
I doubt this will get backported to the 0.9.x branch. However, this is possible with 0.9 if you do it in a client. There are examples in my CouchRest client of running a Ruby function over the unique keys in a map view, but the pattern of just dumping a group reduce function into another DB is simple and effective. What I'm adding is simply a shortcut so that people can more effectively play around with chaining map reduce queries. For now the snapshot dbs will not update incrementally. However, they are just documents so you can do in-place transformations on them (if you want). --- Actually I'm having second thoughts about putting this into CouchDB. It's still a worthwhile technique, but I think we should encourage you to use HTTP tools to run it. Here's why: So, on a single node, this would be all well and good - you'd be able to get a sorted list of tags by popularity, by running a simple map-by-group-reduce-value view on the snapshot database. On a clustered setup, like couchdb-lounge provides, you'd end up with problems, as each snapshot db would only reflect reductions run locally (on the single shard). This is because the Erlang API used by Hovercraft is not a multi-node API. Eventually we could give CouchDB an internal Erlang proxy - but for now, multi-node clusters must be built on HTTP. So, since these Hovercraft chain snapshots are built against a single node, the fullly merged sort-by-value map query across the cluster could have incorrect ordering. To guarantee correct ordering of tags by popularity in a clustered deployment, you'd have to run the global reduce function (not against a single local node) but against the entire cluster, via something like couchdb-lounge's Twisted Python rereducing proxy. Ergo, a group-reduce chaining library is better off not written via Hovercraft, because it should use the HTTP API. Anyone have a Python version of this? Performance freaks don't worry - in this application of HTTP there are just a handful of long running connections and you should be able to get disk IO bound even with the HTTP overhead. Chris > Cheers, > Zach > > On Fri, Jun 5, 2009 at 6:17 AM, Viacheslav Seledkin > <[email protected]> wrote: >> Chris Anderson wrote: >>> >>> I finally got around to writing my map reduce copier. it's still >>> basic, but what do you think? >>> >>> I want to put it into trunk as an http call, like: >>> >>> POST /_snapshot_view >>> >>> with JSON >>> >>> {"src":"/srcdb/_design/app/_view/reduce_count", "group_level":2, >>> "target":"/targetdb"} >>> >>> Chainable map reduce seems to be one of the most popular requests on >>> the survey we took, so hopefully this will make the heavy-data crew >>> happy. >>> >>> There is an implementation here: >>> >>> >>> http://github.com/jchris/hovercraft/commit/34b44527b660a740858cc71aa2c8326747465e31#L0R290 >>> >>> What this does is take the results you'd get from query your reduce >>> view with group=true, and copy them to a new database. Basically you >>> end up with a database full of docs that look like: >>> >>> { >>> "key":[2009,2,14], >>> "value": 511 >>> } >>> >>> Since they are docs sitting in another CouchDB, you can use more >>> ordinary CouchDB Map Reduce views on that database to do things like >>> sort by value, so you can for instance sort tags by popularity, or >>> days by user activity, etc. >>> >>> Chris >>> >>> >>> -- >>> Chris Anderson >>> http://jchrisa.net >>> http://couch.io >>> >>> . >>> >>> >> >> The process of updating of shapshot db will be incremental? >> > -- Chris Anderson http://jchrisa.net http://couch.io
