On Apr 28, 2009, at 8:13 AM, Wout Mertens wrote:

On Apr 28, 2009, at 11:42 AM, Brian Candler wrote:

Furthermore, if I understand rightly, these grouped reduce values are *not* persisted anywhere (*). That is: every time you do a reduce=true&group=true query then the entire map index is scanned for distinct keys and then each set of values with equal keys is passed afresh to the reduce function.

This means that it won't be possible to do incremental changes to the review DB, since these grouped keys and reduce values aren't associated with the docids they came from. You'd have to calculate it all from scratch every time, in which case you might as well just get the client to do it. AFAICS, CouchDB could cache the result only when the database is not receiving any
updates.

You're correct in that group reduce values are not persisted, however the algorithm I proposed can do incremental updates nonetheless:
=========
Group key values are calculated on-the-fly from the view result b- tree. So whenever a reduce call results in a new value for a b-tree node, AND that node is the upper node of a subtree that is completely part of a group key, that group key needs to be marked for recalculation.

Likewise, if deletion/addition of a b-tree node results in the removal/creation of the sole upper node of a group key subtree, that group key needs to be marked for removal/addition.

This is the algorithm:
- When a reducing view gets updated, and it is part of a Review DB, use the 2 paragraphs above to keep a list of group keys that need handling - After updating the reduce() results, for each of the marked group keys:
- If a group key gets removed:
 - look up doc with key=group key in review db. If exists:
   - delete doc
- If a group key gets added:
 - look up doc with key=group key in review db. If exists:
   - set doc.value to the row value
 - else
- create doc with id=group key in string form, key=group key, value=value
- If a group key gets updated:
 - look up doc with key=group key in review db. If exists:
   - set doc.value to the row value
 - else
- create doc with id=group key in string form, key=group key, value=value
========
Is that explanation clear?

I'm imagining that at any point on the b-tree where a reduce value is received, CouchDB can see if that node is one of the top nodes for a group key and can mark that group key somehow. I think that should be feasible (although I admit I've never tried to grok the code).

Cheers,

Wout.

I'm not an expert on the btree code, but I think this statement

So whenever a reduce call results in a new value for a b-tree node, AND that node is the upper node of a subtree that is completely part of a group key, that group key needs to be marked for recalculation.


misses some updates.  I thought things looked something like

          root
            |
     ---------------
    kp1           kp2
     |             |
 ---------     ---------
foo foo bar   bar bar bar

where foo and bar are emitted keys from a map (values not show for the sake of brevity). kp1 and kp2 hold reduce(foo,foo,bar) and reduce(bar,bar,bar) respectively, and root holds rereduce(kp1,kp2).

When we request a view with group=true, Couch is smart enough to use the reduction stored in kp2 directly, but it has to call reduce(foo,foo) and reduce(bar) on the fly for the nodes underneath kp1 (and then rereduce the bar reductions from kp1 and kp2 to get the result for bar). If I interpreted Wout's statement correctly, it ignores the case where any of the nodes under kp1 change, since kp1 is not "the upper node of a subtree that is completely part of a group key".

I think the problem of tracking which group keys to update can be made simpler. We really only need to see the key updates coming out of the incremental map. Couch knows the lists of keys to add and keys to remove at that point; the set of all unique keys in those two lists (the "changeset") is the set of group keys that would need to be updated in the Review DB. Here I'm using "update" in the general sense of add/remove/change; the way I see it, we could just query the view for all the keys in the changeset. If the MR view has no results for a given key, that obviously means delete the associated document from the Review DB.

group_levels are a straightforward extension -- we just check what group_level we'll be using in the Review DB when we calculate the changeset.

Regards, Adam

Reply via email to