On Apr 28, 2009, at 8:13 AM, Wout Mertens wrote:
On Apr 28, 2009, at 11:42 AM, Brian Candler wrote:
Furthermore, if I understand rightly, these grouped reduce values
are *not*
persisted anywhere (*). That is: every time you do a
reduce=true&group=true
query then the entire map index is scanned for distinct keys and
then each
set of values with equal keys is passed afresh to the reduce
function.
This means that it won't be possible to do incremental changes to
the review
DB, since these grouped keys and reduce values aren't associated
with the
docids they came from. You'd have to calculate it all from scratch
every
time, in which case you might as well just get the client to do it.
AFAICS,
CouchDB could cache the result only when the database is not
receiving any
updates.
You're correct in that group reduce values are not persisted,
however the algorithm I proposed can do incremental updates
nonetheless:
=========
Group key values are calculated on-the-fly from the view result b-
tree. So whenever a reduce call results in a new value for a b-tree
node, AND that node is the upper node of a subtree that is
completely part of a group key, that group key needs to be marked
for recalculation.
Likewise, if deletion/addition of a b-tree node results in the
removal/creation of the sole upper node of a group key subtree, that
group key needs to be marked for removal/addition.
This is the algorithm:
- When a reducing view gets updated, and it is part of a Review DB,
use the 2 paragraphs above to keep a list of group keys that need
handling
- After updating the reduce() results, for each of the marked group
keys:
- If a group key gets removed:
- look up doc with key=group key in review db. If exists:
- delete doc
- If a group key gets added:
- look up doc with key=group key in review db. If exists:
- set doc.value to the row value
- else
- create doc with id=group key in string form, key=group key,
value=value
- If a group key gets updated:
- look up doc with key=group key in review db. If exists:
- set doc.value to the row value
- else
- create doc with id=group key in string form, key=group key,
value=value
========
Is that explanation clear?
I'm imagining that at any point on the b-tree where a reduce value
is received, CouchDB can see if that node is one of the top nodes
for a group key and can mark that group key somehow. I think that
should be feasible (although I admit I've never tried to grok the
code).
Cheers,
Wout.
I'm not an expert on the btree code, but I think this statement
So whenever a reduce call results in a new value for a b-tree node,
AND that node is the upper node of a subtree that is completely part
of a group key, that group key needs to be marked for recalculation.
misses some updates. I thought things looked something like
root
|
---------------
kp1 kp2
| |
--------- ---------
foo foo bar bar bar bar
where foo and bar are emitted keys from a map (values not show for the
sake of brevity). kp1 and kp2 hold reduce(foo,foo,bar) and
reduce(bar,bar,bar) respectively, and root holds rereduce(kp1,kp2).
When we request a view with group=true, Couch is smart enough to use
the reduction stored in kp2 directly, but it has to call
reduce(foo,foo) and reduce(bar) on the fly for the nodes underneath
kp1 (and then rereduce the bar reductions from kp1 and kp2 to get the
result for bar). If I interpreted Wout's statement correctly, it
ignores the case where any of the nodes under kp1 change, since kp1 is
not "the upper node of a subtree that is completely part of a group
key".
I think the problem of tracking which group keys to update can be made
simpler. We really only need to see the key updates coming out of the
incremental map. Couch knows the lists of keys to add and keys to
remove at that point; the set of all unique keys in those two lists
(the "changeset") is the set of group keys that would need to be
updated in the Review DB. Here I'm using "update" in the general
sense of add/remove/change; the way I see it, we could just query the
view for all the keys in the changeset. If the MR view has no results
for a given key, that obviously means delete the associated document
from the Review DB.
group_levels are a straightforward extension -- we just check what
group_level we'll be using in the Review DB when we calculate the
changeset.
Regards, Adam