I've been pondering this issue of the weird _design/ doc hack. I'd
either agree with Zach on having separately named keys for open or
right on *both* ends, or specific to the string and array types, a
startswith parameter. I don't much like the startswith idea though as
it's not generally applicable.

Also, did I miss what you'd pass in the _design doc scenario as end
key assuming right open semantics?

On Thu, Feb 5, 2009 at 4:57 PM, Zachary Zolton <[email protected]> wrote:
> Maximillian,
>
> I'd think both _could_ be useful.
>
> I mean in Ruby we have both for the right-hand boundary of ranges:
>  irb(main):005:0> (1..5).max
>  => 5
>  irb(main):006:0> (1...5).max
>  => 4
>
> IMHO, it would be better to use a different pair of parameter names,
> such that we could easily distinguish between open and closed bounds.
>
>
> Cheers,
>
> Zach
>
>
> PS. Is it "Maximillian" or "Max"?  :^D
>
> On Thu, Feb 5, 2009 at 3:32 PM, Maximillian Dornseif (JIRA)
> <[email protected]> wrote:
>>
>>    [ 
>> https://issues.apache.org/jira/browse/COUCHDB-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670911#action_12670911
>>  ]
>>
>> Maximillian Dornseif commented on COUCHDB-194:
>> ----------------------------------------------
>>
>> So far nobody seems against it.
>>
>> The downside is that it MIGHT break some existing code.
>>
>>> [startkey, endkey[: provide a right-open range selection method
>>> ---------------------------------------------------------------
>>>
>>>                 Key: COUCHDB-194
>>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-194
>>>             Project: CouchDB
>>>          Issue Type: Improvement
>>>          Components: HTTP Interface
>>>    Affects Versions: 0.9
>>>            Reporter: Maximillian Dornseif
>>>            Priority: Blocker
>>>             Fix For: 1.0
>>>
>>>
>>> While writing something about using CouchDB I came across the issue of 
>>> "slice indexes" (called startkey and endkey in CouchDB lingo).
>>> I found no exact definition of startkey and endkey anywhere in the 
>>> documentation. Testing reveals that access on _all_docs and on views 
>>> documents are retuned in the interval
>>> [startkey, endkey] = (startkey <= k <= endkey).
>>> I don't know if this was a conscious design decision. But I like to promote 
>>> a slightly different interpretation (and thus API change):
>>> [startkey, endkey[ = (startkey <= k < endkey).
>>> Both approaches are valid and used in the real world. Ruby uses the 
>>> inclusive ("right-closed" in math speak) first approach:
>>> >> l = [1,2,3,4]
>>> >> l.slice(1,2)
>>> => [2, 3]
>>> Python uses the exclusive ("right-open" in math speak) second approach:
>>> >>> l = [1,2,3,4]
>>> >>> l[1,2]
>>> [2]
>>> For array indices both work fine and which one to prefer is mostly an issue 
>>> of habit. In spoken language both approaches are used: "Have the Software 
>>> done until saturday" probably means right-open to the client and 
>>> right-closed to the coder.
>>> But if you are working with keys that are more than array indexes, then 
>>> right-open is much easier to handle. That is because you have to *guess* 
>>> the biggest value you want to get. The Wiki at 
>>> http://wiki.apache.org/couchdb/View_collation contains an example of that 
>>> problem:
>>> It is suggested that you use
>>> startkey="_design/"&endkey="_design/ZZZZZZZZZ"
>>> or
>>> startkey="_design/"&endkey="_design/\u9999″
>>> to get a list of all design documents - also the replication system in the 
>>> db core uses the same hack.
>>> This breaks if a design document is named "ZZZZZZZZZTop" or 
>>> "\9999Iñtërnâtiônàlizætiøn". Such names might be unlikely but we are 
>>> computer scientists; "unlikely" is a bad approach to software engineering.
>>> The think what we really want to ask CouchDB is to "get all documents with 
>>> keys starting with '_design/'".
>>> This is basically impossible to do with right-closed intervals. We could 
>>> use startkey="_design/"&endkey="_design0″ ('0′ is the ASCII character after 
>>> '/') and this will work fine ... until there is actually a document with 
>>> the key "_design0″ in the system. Unlikely, but ...
>>> To make selection by intervals reliable currently clients have to guess the 
>>> last key (the ZZZZ approach) or use the fist key not to include (the 
>>> _design0 approach) and then post process the result to remove the last 
>>> element returned if it exactly matches the given endkey value.
>>> If couchdb would change to a right-open interval approach post processing 
>>> would go away in most cases. See 
>>> http://blogs.23.nu/c0re/2008/12/building-a-track-and-trace-application-with-couchdb/
>>>  for two real world examples.
>>> At least for string keys and float keys changing the meaning to [startkey, 
>>> endkey[ would allow selections like
>>> * "all strings starting with 'abc'"
>>> * all numbers between 10.5 and 11
>>> It also would hopefully break not to much existing code. Since the notion 
>>> of endkey seems to be already considered "fishy" (see the ZZZZZ approach) 
>>> most code seems to try to avoid that issue. For example 
>>> 'startkey="_design/"&endkey="_design/ZZZZZZZZZ"' still would work unless 
>>> you have a design document being named exactly "ZZZZZZZZZ".
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>

Reply via email to