This question keeps coming up and it occurred to me that no one ever
mentions startkey_docid and endkey_docid when talking about this
issue. Theoretically the docid would be a prime candidate for
programatically selecting an open or closed interval on either end of
the key range. Since DocID is guranteed to be a string, we can use nil
and {} to select before key and after key respectively. (For
reference, startkey_docid actually defaults to nil and endkey_docid to
{} in the implementation)

So anyway, it turns out that someone thought ahead and realized that
those keys would only be strings and used list_to_binary instead of
?JSON_DECODE so this idea doesn't work automagically. But here's a
patch that shows the idea. (Only the idea though, to make this a real
patch we'd need to later a bit of logic for when descending=true)

Index: src/couchdb/couch_httpd_view.erl
===================================================================
--- src/couchdb/couch_httpd_view.erl    (revision 729771)
+++ src/couchdb/couch_httpd_view.erl    (working copy)
@@ -222,9 +222,23 @@
                 throw({query_parse_error, Msg})
             end;
         {"startkey_docid", DocId} ->
-            Args#view_query_args{start_docid=list_to_binary(DocId)};
+            case DocId of
+            "true" ->
+                Args#view_query_args{start_docid=nil};
+            "false" ->
+                Args#view_query_args{start_docid={}};
+            _ ->
+                Args#view_query_args{start_docid=list_to_binary(DocId)}
+            end;
         {"endkey_docid", DocId} ->
-            Args#view_query_args{end_docid=list_to_binary(DocId)};
+            case DocId of
+            "true" ->
+                Args#view_query_args{end_docid=nil};
+            "false" ->
+                Args#view_query_args{end_docid={}};
+            _ ->
+                Args#view_query_args{end_docid=list_to_binary(DocId)}
+            end;
         {"startkey", Value} ->
             case Keys of
             nil ->

Anyone got arguments against?

Paul

On Sun, Dec 28, 2008 at 9:12 AM,  <[email protected]> wrote:
> While writing something about using CouchDB I came across the issue of "slice 
> indexes" (called startkey and endkey in CouchDB lingo).
>
> I found no exact definition of startkey and endkey anywhere in the 
> documentation. Testing reveals that access on _all_docs and on views 
> documents are retuned in the interval
>
> [startkey, endkey] = (startkey <= k <= endkey).
>
> I don't know if this was a conscious design decision. But I like to promote a 
> slightly different interpretation (and thus API change):
>
> [startkey, endkey[ = (startkey <= k < endkey).
>
>
> Both approaches are valid and used in the real world. Ruby uses the inclusive 
> ("right-closed" in math speak) first approach:
>
>>> l = [1,2,3,4]
>>> l.slice(1,2)
> => [2, 3]
>
>
> Python uses the exclusive ("right-open" in math speak) second approach:
>
>>>> l = [1,2,3,4]
>>>> l[1,2]
> [2]
>
>
> For array indices both work fine and which one to prefer is mostly an issue 
> of habit. In spoken language both approaches are used: "Have the Software 
> done until saturday" probably means right-open to the client and right-closed 
> to the coder.
>
> But if you are working with keys that are more than array indexes, then 
> right-open is much easier to handle. That is because you have to *guess* the 
> biggest value you want to get. The Wiki at 
> http://wiki.apache.org/couchdb/View_collation contains an example of that 
> problem:
>
> It is suggested that you use
> startkey="_design/"&endkey="_design/ZZZZZZZZZ"
> or
> startkey="_design/"&endkey="_design/\u9999"
> to get a list of all design documents
>
> This breaks if a design document is named "ZZZZZZZZZTop" or 
> "\9999Iñtërnâtiônàlizætiøn". Such names might be unlikely but we are computer 
> scientists; "unlikely" is a bad approach to software engineering.
>
> The think what we really want to ask CouchDB is to "get all documents with 
> keys starting with '_design/'".
>
> This is basically impossible to do with right-closed intervals. We could use 
> startkey="_design/"&endkey="_design0" ('0' is the ASCII character after '/') 
> and this will work fine ... until there is actually a document with the key 
> "_design0" in the system. Unlikely, but ...
>
> To make selection by intervals reliable currently clients have to guess the 
> last key (the ZZZZ approach) or use the fist key not to include (the _design0 
> approach) and then post process the result to remove the last element 
> returned if it exactly matches the given endkey value.
>
>
> If couchdb would change to a right-open interval approach post processing 
> would go away in most cases. See 
> http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
>  for two real world examples.
>
> At least for string keys and float keys changing the meaning to [startkey, 
> endkey[ would allow selections like
>
> * "all strings starting with 'abc'"
> * all numbers between 10.5 and 11
>
> It also would hopefully break not to much existing code. Since the notion of 
> endkey seems to be already considered "fishy" (see the ZZZZZ approach) most 
> code seems to try to avoid that issue. For example 
> 'startkey="_design/"&endkey="_design/ZZZZZZZZZ"' still would work unless you 
> have a design document being named exactly "ZZZZZZZZZ".
>
> Regards
>
> Maximillian Dornseif
>
>

Reply via email to