Re: Proposal: Review DBs

Brian Candler Tue, 28 Apr 2009 02:54:43 -0700

On Sun, Apr 26, 2009 at 11:26:33PM +0200, Wout Mertens wrote:
> In a nutshell, I'm hoping that:
> * A review is a new sort of view that has an "inputs" array in its  
> definition.
> * Only MR views are allowed as inputs, no KV duplication allowed.
> * It builds a persistent index of the incoming views when those get  
> updated.
> * That index is then used to build the view index for the review when  
> the review gets updated.
> * I think I covered the most important algorithms needed to implement  
> this in my original proposal.
>
> Does this sound feasible? If so I'll update my proposal accordingly.


Could you define a bit more where the "inputs" array comes from?

It can't be the overall reduce value for the whole database - that would
just be one single value :-) So it has to be the reduce value in some
grouped form, but there are multiple ways to slice that: e.g. group=true,
group_level=1, group_level=2 etc. This is something which will have to be
chosen by the user.

Furthermore, if I understand rightly, these grouped reduce values are *not*
persisted anywhere (*). That is: every time you do a reduce=true&group=true
query then the entire map index is scanned for distinct keys and then each
set of values with equal keys is passed afresh to the reduce function.

This means that it won't be possible to do incremental changes to the review
DB, since these grouped keys and reduce values aren't associated with the
docids they came from. You'd have to calculate it all from scratch every
time, in which case you might as well just get the client to do it. AFAICS,
CouchDB could cache the result only when the database is not receiving any
updates.

OTOH, it's quite possible that I have misunderstood entirely :-)

Regards,

Brian.

(*) Each B-tree node containing N key/value pairs also contains a single
reduce value for those N documents. However the B-tree nodes are not aligned
in any way with the map keys. These precalculated values do allow quicker
calculation of reduce values where a large number of documents emit the same
key, spanning multiple B-tree nodes, since the already-reduced values can be
re-reduced together.

Re: Proposal: Review DBs

Reply via email to