Re: Proposal: Review DBs

Wout Mertens Tue, 28 Apr 2009 05:14:34 -0700

On Apr 28, 2009, at 11:42 AM, Brian Candler wrote:

On Sun, Apr 26, 2009 at 11:26:33PM +0200, Wout Mertens wrote:

In a nutshell, I'm hoping that:
* A review is a new sort of view that has an "inputs" array in its
definition.
* Only MR views are allowed as inputs, no KV duplication allowed.
* It builds a persistent index of the incoming views when those get
updated.
* That index is then used to build the view index for the review when
the review gets updated.
* I think I covered the most important algorithms needed to implement
this in my original proposal.


Does this sound feasible? If so I'll update my proposal accordingly.


Could you define a bit more where the "inputs" array comes from?

Oh right, I didn't explain that well, and I never said how to add thegroup_level. Here's an example for counting the number of unique namesin a database:


{
  "_id": "_design/foo",
  "views": {
      "unique_names": {
          "map": "function(doc) {emit(doc.name, 1);}",

"reduce": "function(keys, values, rereduce) {returnsum(values);}"

     }
      "unique_names_count": {
          "inputs": {"unique_names":1},
          "map": "function(doc) {emit(doc.key, 1);}",

"reduce": "function(keys, values, rereduce) {returnsum(values);}"

     }
  }
}

In this case, unique_names_count is the review db. It gets input fromthe unique_names view with group_level set to 1. Note that it's a hashsince each view only gets added once and each view has a unique name.

(BTW, I can't figure out what to set group_level to if I want theequivalent of group=true. group_level=exact doesn't work althoughgroup_level=1000 might be good enough. How about allowinggroup_level=-1?)

(Also, in this case just "unique_names_count": {"inputs":{"unique_names":1}} would suffice since getting that view with limit=0will give you the total_rows number which is what was needed)

Furthermore, if I understand rightly, these grouped reduce valuesare *not*persisted anywhere (*). That is: every time you do areduce=true&group=truequery then the entire map index is scanned for distinct keys andthen each
set of values with equal keys is passed afresh to the reduce function.
This means that it won't be possible to do incremental changes tothe reviewDB, since these grouped keys and reduce values aren't associatedwith thedocids they came from. You'd have to calculate it all from scratcheverytime, in which case you might as well just get the client to do it.AFAICS,CouchDB could cache the result only when the database is notreceiving any
updates.

You're correct in that group reduce values are not persisted, howeverthe algorithm I proposed can do incremental updates nonetheless:

=========

Group key values are calculated on-the-fly from the view result b-tree. So whenever a reduce call results in a new value for a b-treenode, AND that node is the upper node of a subtree that is completelypart of a group key, that group key needs to be marked forrecalculation.

Likewise, if deletion/addition of a b-tree node results in the removal/creation of the sole upper node of a group key subtree, that group keyneeds to be marked for removal/addition.


This is the algorithm:

- When a reducing view gets updated, and it is part of a Review DB,use the 2 paragraphs above to keep a list of group keys that needhandling- After updating the reduce() results, for each of the marked groupkeys:

- If a group key gets removed:
  - look up doc with key=group key in review db. If exists:
    - delete doc
- If a group key gets added:
  - look up doc with key=group key in review db. If exists:
    - set doc.value to the row value
  - else

- create doc with id=group key in string form, key=group key,value=value

- If a group key gets updated:
  - look up doc with key=group key in review db. If exists:
    - set doc.value to the row value
  - else

- create doc with id=group key in string form, key=group key,value=value

========
Is that explanation clear?

I'm imagining that at any point on the b-tree where a reduce value isreceived, CouchDB can see if that node is one of the top nodes for agroup key and can mark that group key somehow. I think that should befeasible (although I admit I've never tried to grok the code).


Cheers,

Wout.

Re: Proposal: Review DBs

Reply via email to