I ran some experiments and figured out a few things:
If a view's key has components, CouchDB will indeed maintain
intermediate reduce results at each group level. It will use these
intermediate results to efficiently calculate arbitrary ranges. For
example, if I asked for sum(2008-03-11 to 2008-07-25), CouchDB will call
reduce twice. The first call will sum all the included days in march and
july. The second reduce will have combine=true and sum the previous
result with april, may and june, who's sums are already in the index.
CouchDB also seems to intrinsically partition keyspace into groups of
approximately 43-45. I don't know the significance of this number but it
is probably some tweaked threshold value for the b-tree algorithm.
The bottom line is that reduced views with arbitrary key ranges run in
log time, without doing anything special.
Chris Anderson wrote:
On Sat, Nov 22, 2008 at 9:09 PM, Jedediah Smith
<[EMAIL PROTECTED]> wrote:
A possible compromise would be to use group_level to find the balance per
component and then add those together on the client. Example:
balance(2008-11-22) =
sum(-inf to 2007-) +
sum(2008-01- to 2008-10-) +
sum(2008-11-01 to 2008-11-22)
This looks like the right way to combine multiple time ranges to me.
Adding on the client is a fine thing in a case like this. However, I
think you can do it in a single query.
If a view like the
above existed and I updated an old transaction, there would only be one
rereduce for each group level, right?
Querying with group=false will be faster, I think. (I should benchmark this...)
In the normal case, with a modest amount of data, that's about right.
Each grouped view query (I think... I really should bust out the log()
in the views to know for sure...) will fire at least one JavaScript
rereduce. In the case of very very much data and a first time reduce
query over that range, the rereduce could run a few times, but the #
of rereduces run should increase only logarithmically with the # of
rows, if I'm not mistaken. It's only when you run multiple queries (or
multple reduces for groups within a range) that you're likely to run
into a linear increase in the number of rereduces. Again, this should
be explored in the log, but I think you'll get a minimum of 1 rereduce
per group query.
The simplest query to get someone's running balance would be something like:
_view/viewname?startkey=["bob", BEGINNING_OF_TIME]&endkey=["bob", CURRENT_DATE]
which has an implicit reduce=true&group=false.
BTW Jan I really like your array date format.