On Sun, Apr 26, 2009 at 11:26:33PM +0200, Wout Mertens wrote: > In a nutshell, I'm hoping that: > * A review is a new sort of view that has an "inputs" array in its > definition. > * Only MR views are allowed as inputs, no KV duplication allowed. > * It builds a persistent index of the incoming views when those get > updated. > * That index is then used to build the view index for the review when > the review gets updated. > * I think I covered the most important algorithms needed to implement > this in my original proposal. > > Does this sound feasible? If so I'll update my proposal accordingly.
Could you define a bit more where the "inputs" array comes from? It can't be the overall reduce value for the whole database - that would just be one single value :-) So it has to be the reduce value in some grouped form, but there are multiple ways to slice that: e.g. group=true, group_level=1, group_level=2 etc. This is something which will have to be chosen by the user. Furthermore, if I understand rightly, these grouped reduce values are *not* persisted anywhere (*). That is: every time you do a reduce=true&group=true query then the entire map index is scanned for distinct keys and then each set of values with equal keys is passed afresh to the reduce function. This means that it won't be possible to do incremental changes to the review DB, since these grouped keys and reduce values aren't associated with the docids they came from. You'd have to calculate it all from scratch every time, in which case you might as well just get the client to do it. AFAICS, CouchDB could cache the result only when the database is not receiving any updates. OTOH, it's quite possible that I have misunderstood entirely :-) Regards, Brian. (*) Each B-tree node containing N key/value pairs also contains a single reduce value for those N documents. However the B-tree nodes are not aligned in any way with the map keys. These precalculated values do allow quicker calculation of reduce values where a large number of documents emit the same key, spanning multiple B-tree nodes, since the already-reduced values can be re-reduced together.
