I would definitely +1 something like this. I'm essentially do something like this "manually", with an update notifier script, and I'd love to see it become a first-class feature —instead of my special-case version of it.
On Wed, Apr 22, 2009 at 7:40 AM, Wout Mertens <[email protected]> wrote: > Intro > ===== > How do you sort by reduce value? How do you join views? How do you get > unique view results? How do you cache group key reduces? > > I think that with the below proposed solution all the above and more are > possible. The general idea is to store view results and run map/reduce on > them. There's been some discussions about this but they went nowhere. I've > been thinking about this issue a bit and I think it can be done. > > I'd like to call this feature a Review DB. > > Use cases > ========= > - Suppose you want to know what tags are most popular on your blog. Simply > get: > > http://couchdb/db/_design/myblog/_review/tags_by_count/_view/sort_by_value > > Where tags_by_count is a Review DB that gets input from the tagcount view > and then runs the sort_by_value view on it, a map() function that simply > emits (value,key). > > Likewise, show pages in order of popularity, whereby user can vote up (+1) > or down (-1): > > http://couchdb/db/_design/mywiki/_review/pagevotes/_view/sort_by_value > > - Given documents with attributes title, date and tags. You'd like to know > the minimum value of date and a breakdown by count for tags, for every > title. Normally you'd use 2 map+reduce views, minimum_date_by_title and > tagcount_by_title, which you would then query separately. With a Review DB, > you can let both views insert their results in the database and then run a > view that combines the results in one view: > > http://couchdb/db/_design/mybookstore/_review/mybooks/_view/aggregate_book_data > > - This is not a way to run an on-the-fly map/reduce on a subset of a view, > like if you want to find the median popularity score of restaurants with > "Tony" in their name that are close to you. > > Implementation > ============== > A Review DB is a hidden database maintained by CouchDB with these fields: > - _id of document is the string representation of the key > - "key" is the key of the incoming view row (unique) > - "value" is the value of the incoming view row > > I hope that this is sufficiently like a normal view that it can be stored as > a normal view. _id is just there to make it doc-compliant, it would be much > better if "key" were the actual key. > > A Review DB is defined in a design document like normal views. Each review > is an entry in the "reviews" hash, and has a "incoming_views" array that > lists all the views that should insert results in the review db plus the > group level, as well as a normal "views" hash for further map/reduce of the > review db (and perhaps another "reviews" hash for further result > processing?). > > Maintaining a database of results means that results have to be updated or > even removed when documents change. I tried to make this work (in theory) > for map-only views, but the resulting requirements are quite messy. You > either need to cache the previous results of a view for each document, or > you have to have an old version of the document available to regenerate > those results. > > Therefore, a Review DB only accepts results from one or more map+reduce > views. You define beforehand what the group_level of the keys is that will > be inserted. > > Furthermore, a Review DB disallows (but doesn't enforce) having 2 views that > generate the same keys. Otherwise, refcounting would need to be used and > while that's not difficult, I think there's limited value in allowing this. > > The Review DB needs updating every time the reduction for a group key of one > of the participating views gets updated. Even though a map+reduce view has > unique keys, we need a refcount since we have multiple views. Whoever got to > insert its value last wins. > > There is a slight complication: group key values are calculated on-the-fly > from the view result b-tree. So whenever a reduce call results in a new > value for a b-tree node, AND that node is the upper node of a subtree that > is completely part of a group key, that group key needs to be marked for > recalculation. > > Likewise, if deletion/addition of a b-tree node results in the > removal/creation of the sole upper node of a group key subtree, that group > key needs to be marked for removal/addition. > > This is the algorithm: > - When a reducing view gets updated, and it is part of a Review DB, use the > 2 paragraphs above to keep a list of group keys that need handling > - After updating the reduce() results, for each of the marked group keys: > - If a group key gets removed: > - look up doc with key=group key in review db. If exists: > - delete doc > - If a group key gets added: > - look up doc with key=group key in review db. If exists: > - set doc.value to the row value > - else > - create doc with id=group key in string form, key=group key, > value=value > - If a group key gets updated: > - look up doc with key=group key in review db. If exists: > - set doc.value to the row value > - else > - create doc with id=group key in string form, key=group key, > value=value > > As you can see, this is something CouchDB should do since it knows when it's > updating group key reduction values and it knows if this was an delete, > update or addition. > > View updates are done when the view is called; Review updates are done at > this time as well. Views on Review DBs are done when they are called. > > Summary > ======= > Review DBs are a sort of view index that CouchDB can maintain with little > overhead. It caches group key results and allows chained map+reduce > calculations using mostly existing frameworks. > > I think this would be a very useful feature for CouchDB to have. There are > regularly requests for storing view results in a database for > post-processing on the mailing lists. > > I'm not saying this is a trivial change but it doesn't seem technically > impossible to me either. (unless I missed something again; this is the 5th > iteration of this proposal. Anyway I know *I* wouldn't be able to code this > :-) ) > > What do you think, oh dear devs? > > Wout. >
