[ 
https://issues.apache.org/jira/browse/COUCHDB-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748492#action_12748492
 ] 

Paul Joseph Davis commented on COUCHDB-485:
-------------------------------------------

I'm highly doubt that I would be a fan of a patch for this. Unless I'm missing 
a way to do this efficiently, I'm pretty sure that this would be the same logic 
that would be required to use an array filter.

For instance, consider the worst case scenario. You have a view that emits the 
same key for every document. Then, your query is a startkey that collates 
before the key, and startkey_docid collates to the last document id. The query 
would then have to seek through the entire view set which is unbounded. It 
could be argued that allowing for skip=N provides the same sinkhole in terms of 
efficiency, but that seems a bit more of an obvious user choice to me.

startkey_docid can definitely be confusing until you learn that btree's are 
sorted by (Key, DocId). And its the same confusion that pops up with sorting 
arrays. However, its not artificially limiting, its just a limit of slicing a 
collated list.

Also, re-reading the ticket, I think there's some confusion. If you select an 
identical key range, then startkey_docid will select a range of documents as 
necessary. For instance, in the slug case, if you emit(doc.category, null) then 
?startkey=category&startkey_docid=first_docid&endkey=category&endkey_docid=last_docid,
 you will get back just the range of docids.

> 'startkey_docid' should function like 'startkey'
> ------------------------------------------------
>
>                 Key: COUCHDB-485
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-485
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: HTTP Interface
>    Affects Versions: 0.10
>         Environment: N/A
>            Reporter: Christopher Groskopf
>            Priority: Minor
>
> The 'startkey_docid' and 'endkey_docid' parameters provide a way of 
> sub-selecting rows for pagination when a view emits many rows with identical 
> key values.  However, it seems both confusing and unintentionally limiting 
> that 'startkey_docid' does not function the same as 'startkey' with regard to 
> how included documents are identified.
> By this I mean, that if a a group of data is emitted with ISO 8601 timestamps 
> as keys (e.g. "2009-08-25T12:00:00Z") then its possible to specify 
> 'startkey="2009-08"' and include that example data, because it is collated 
> after 'startkey'.  However, it those timestamps were emitted as doc ids 
> instead of keys, 'startkey_docid'  will only act to filter the data if it 
> _exactly_ matches a doc id.  Specifying 'startkey_docid="2009-08"' would not 
> filter the data at all, even if every selected row has the same key.
> The benefit of implementing this change is that views which emit many 
> identical keys could be sub-filtered based on document id.  In the case of my 
> application, the first portion of a document's id is a timestamp, so I would 
> be able to select a chronological subset of rows after they had been filtered 
> by key.  Another possible use case is where doc ids are slugs--this would 
> make it possible to select an alphabetical range after specifying a category 
> as a key parameter.
> I haven't looked under the hood and I have never written Erlang, so I have no 
> way of accurately estimating how significant this change would be.  Unless 
> I'm misunderstanding something, this change should not break existing code.
> Looking forward to reading any feedback/comments/alternatives.
> Thanks,
> Chris

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to