kocolosk commented on issue #3562: URL: https://github.com/apache/couchdb/issues/3562#issuecomment-840979261
You're correct about the current implementation of the `skip` parameter. We keep statistics at the shard level that allow for efficient skipping of rows, but there's a bit of subtlety in computing the correct number of rows to skip in each shard of a database comprised of multiple shards. I sketched a solution to the problem in JIRA several years ago: https://issues.apache.org/jira/browse/COUCHDB-2784 Quoting here: > For a database with Q shards and a request specifying ?skip=X We know that either a) at least one of the shards will end skipping at least `X / Q` rows, or b) the entire response body will be empty. So, I propose the following: > 1. Set the per-shard skip value to `X div Q` > - If a shard has fewer than `X div Q` rows remaining it should send its last row > - If `X div Q` is zero we can short-circuit and just use the current algorithm. > 1. The coordinator sorts the first responses from each shard. It then sends the key of the row that sorts first (let's call it Foo) back to all the shards > 1. Each shard counts the number of rows in between the original startkey and Foo and sends that number, then starts streaming with Foo as the new startkey > 1. The coordinator deducts the computed per-shard skip values from the user-specified skip and then takes care of the remainder in the usual way we do it today (i.e. by consuming the rows as they come in). > > [T]he algorithm could be executed recursively, e.g. if after one pass we're still left trying to consume a million rows in the coordinator we can farm that out to the shards again and identify an even better start key (Foo 2.0). Lather, rinse, repeat until the number of rows that would be consumed by the coordinator is acceptably small. It's a moderately complex patch, and won't apply to 4.0. Efficient skips might be possible in 4.0 but it depends a bit on what sorts of statistics we might maintain in any index data structures we implement over FoundationDB (FDB itself doesn't provide any). I think we have a documentation bug here; we imply that large values of `skip` are fast for any release after 1.2, when it's really only fast for 1.x where x>2. But I would not call this a bug in CouchDB itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
