User-defined GroupRowsFun
-------------------------

                 Key: COUCHDB-403
                 URL: https://issues.apache.org/jira/browse/COUCHDB-403
             Project: CouchDB
          Issue Type: Wish
          Components: Database Core, HTTP Interface
            Reporter: Brian Candler
            Priority: Minor


CouchDB has hard-coded functionality for grouping. From the user's point of 
view: group_level=N will truncate Array keys to the first N elements, and 
that's it. (*)

It would be wonderful if application-specific grouping functions could be 
added. Useful examples include:

* for string keys, truncate to the first N characters (e.g. group by first 3 
letters of surname)
* for numeric keys, trunc(k/N) (e.g. divide by 100 would give you buckets of 
0..99, 100..199, 200..299 etc)
* combine with group_level: e.g. truncate array to first two elements plus the 
third element divided by 100

    ["string1","string2",Number,"rest"] => 
["string1","string2",trunc(Number/100)]

* for numeric keys: use trunc(log(V) * N) for exponential buckets
* for hexadecimal-string keys: right-shift N places
* ...etc

In each case N would be a parameter chosen at query time, like group_level is 
now.

It would be sufficient just to have a hook to statically link Erlang functions 
to do this. There would then need to be two new HTTP parameters: one to choose 
the grouping function and one for any arguments it needs.

Theoretically this function could also be handed off to the external view 
server so the logic could be written in Javascript or whatever, but I think it 
would be too slow in practice.

Note: group truncation functions would have need to meet certain constraints to 
work with grouping logic. Something like:
   K1 <= K2 implies grouptrunc(K1) <= grouptrunc(K2)

(*) It's not implemented exactly like that. As far as I can see, there's one 
function to compare keys for equality by looking at the first N elements 
(GroupRowsFun), and another function truncates them when emitting them 
(RespFun). For adding bolt-on functions it would be more convenient just to 
define a single group key truncation function.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to