On Apr 28, 2009, at 3:46 PM, Adam Kocoloski wrote:

I'm not an expert on the btree code, but I think this statement

So whenever a reduce call results in a new value for a b-tree node, AND that node is the upper node of a subtree that is completely part of a group key, that group key needs to be marked for recalculation.


misses some updates.  I thought things looked something like

         root
           |
    ---------------
   kp1           kp2
    |             |
---------     ---------
foo foo bar   bar bar bar

where foo and bar are emitted keys from a map (values not show for the sake of brevity). kp1 and kp2 hold reduce(foo,foo,bar) and reduce(bar,bar,bar) respectively, and root holds rereduce(kp1,kp2).

That is my understanding as well.

When we request a view with group=true, Couch is smart enough to use the reduction stored in kp2 directly, but it has to call reduce(foo,foo) and reduce(bar) on the fly for the nodes underneath kp1 (and then rereduce the bar reductions from kp1 and kp2 to get the result for bar). If I interpreted Wout's statement correctly, it ignores the case where any of the nodes under kp1 change, since kp1 is not "the upper node of a subtree that is completely part of a group key".

In this case, both foos are the upper node of a subtree that is completely part of a group key. bar isn't.

Same graph with the group keys indicated:

         root
           |
    ------BAR------
   kp1           kp2
    |             |
--FOO----     ---------
foo foo bar   bar bar bar

So whenever the first bar changes, kp1 needs to be calculated and then BAR is marked as needing updating, since kp1 is a top node under BAR.

FOO is only marked for updating when either of the foos change.

I think the problem of tracking which group keys to update can be made simpler. We really only need to see the key updates coming out of the incremental map. Couch knows the lists of keys to add and keys to remove at that point; the set of all unique keys in those two lists (the "changeset") is the set of group keys that would need to be updated in the Review DB. Here I'm using "update" in the general sense of add/remove/change; the way I see it, we could just query the view for all the keys in the changeset. If the MR view has no results for a given key, that obviously means delete the associated document from the Review DB.

That is a very good idea! It's not as efficient as marking group keys like I proposed (maps need not necessarily change reduce values), but it's a lot easier to code.

group_levels are a straightforward extension -- we just check what group_level we'll be using in the Review DB when we calculate the changeset.

Exactly.

So the algorithm becomes:

- When updating a view, keep track of all mentioned keys in the previous and current map() output, keep only group_level key parts.

- After updating the reduce() results, for each of the marked group keys:
- If a group key gets removed:
- look up doc with key=group key in review db. If exists:
  - delete doc
- If a group key gets added:
- look up doc with key=group key in review db. If exists:
  - set doc.value to the row value
- else
- create doc with id=group key in string form, key=group key, value=value
- If a group key gets updated:
- look up doc with key=group key in review db. If exists:
  - set doc.value to the row value
- else
- create doc with id=group key in string form, key=group key, value=value

Good thinking!

Wout.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to