kocolosk commented on issue #3562:
URL: https://github.com/apache/couchdb/issues/3562#issuecomment-840979261


   You're correct about the current implementation of the `skip` parameter. We 
keep statistics at the shard level that allow for efficient skipping of rows, 
but there's a bit of subtlety in computing the correct number of rows to skip 
in each shard of a database comprised of multiple shards. I sketched a solution 
to the problem in JIRA several years ago:
   
   https://issues.apache.org/jira/browse/COUCHDB-2784
   
   Quoting here:
   
   > For a database with Q shards and a request specifying ?skip=X We know that 
either a) at least one of the shards will end skipping at least `X / Q` rows, 
or b) the entire response body will be empty. So, I propose the following:
   > 1. Set the per-shard skip value to `X div Q`
   >     - If a shard has fewer than `X div Q` rows remaining it should send 
its last row
   >     - If `X div Q` is zero we can short-circuit and just use the current 
algorithm.
   > 1. The coordinator sorts the first responses from each shard. It then 
sends the key of the row that sorts first (let's call it Foo) back to all the 
shards
   > 1. Each shard counts the number of rows in between the original startkey 
and Foo and sends that number, then starts streaming with Foo as the new 
startkey
   > 1. The coordinator deducts the computed per-shard skip values from the 
user-specified skip and then takes care of the remainder in the usual way we do 
it today (i.e. by consuming the rows as they come in).
   >
   > [T]he algorithm could be executed recursively, e.g. if after one pass 
we're still left trying to consume a million rows in the coordinator we can 
farm that out to the shards again and identify an even better start key (Foo 
2.0). Lather, rinse, repeat until the number of rows that would be consumed by 
the coordinator is acceptably small.
   
   It's a moderately complex patch, and won't apply to 4.0. Efficient skips 
might be possible in 4.0 but it depends a bit on what sorts of statistics we 
might maintain in any index data structures we implement over FoundationDB (FDB 
itself doesn't provide any).
   
   I think we have a documentation bug here; we imply that large values of 
`skip` are fast for any release after 1.2, when it's really only fast for 1.x 
where x>2. But I would not call this a bug in CouchDB itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to