nickva opened a new pull request, #4238:
URL: https://github.com/apache/couchdb/pull/4238

   Use new `fabric:open_revs/3` API implemented in #4201 to optimize the 
_bulk_get HTTP API. Since `open_revs/3` itself is new, allow reverting to 
individual doc fetches using the previous `open_revs/4` API via a config 
setting, mostly as a precautionary measure.
   
   The implementation consists of three main parts:
     * Parse and validate args
     * Fetch the docs using `open_revs/3` or `open_revs/4`
     * Emit results as json or multipart, based on the `Accept` header value
   
   Parsing and validation checks for various errors and then returns a map of 
`#{Ref => {DocId, RevOrError, DocOptions}}` and a list of Refs in the original 
argument order. The middle tuple element of `RevOrError` is notable that it may 
hold either the revision ID (`[Rev]` or `all`) or `{error, {Rev, ErrorTag, 
ErrorReason}}`.
   
   Fetching the docs is fairly straightforward. The slightly interesting aspect 
is when an error is returned from `open_revs/3` we have to pretend that all the 
batched docs failed with that error. That is done to preserve the "zip" 
property, where all the input arguments have their matching result at the same 
position in results list. Another notable thing here is we fixed a bug where 
the error returned from `fabric:open_revs/3,4` was not formatted in a way it 
could have been emitted as json resulting in a function clause. That is why we 
call `couch_util:to_binary/1` on it. This was detected by the integration 
testing outline before and was missed by the previous mocked unit test.
   
   The last part is emitting the results as either json or multipart. Here most 
changes are cleanups and grouping into separate handler functions. The `Accept` 
header can be either `multipart/related` or `multipart/mixed` and we try to 
emit the same content type as it was passed in the `Accept` header. One notable 
thing here is by DRY-ing the filtering of attachments in 
`non_stubbed_attachments/1` we fixed another bug when the multipart result was 
returning nonsense in cases when all attachments were stubs. The doc was 
returned as a multipart chunk with content type `multipart/...` instead of 
application/json. This was also detected in the integration tests described 
below.
   
   The largest changes are in the testing area. Previous multipart tests were 
using mocks heavily, were quite fragile, and didn't have good coverage. Those 
tests were removed and replaced by new end-to-end tests in 
`chttpd_bulk_get_test.erl`. To make that happen add a simple multipart parser 
utility function which knows how to parse multipart responses into maps. Those 
maps preserve chunk headers and we can match those with `?assertMatch(...)` 
fairly easily. The tests try to get decent coverage for `chttpd_db.erl` 
bulk_get implementation and its utility functions, but they are also end-to-end 
tests so they test everything below, including fabric and couch layers as well.
   
   Quick 1 node testing using the couchdyno replicating of 1 million docs shows 
at least a 2x speedup to complete the replication using this PR.
   
   On main:
   
   ```
   r=rep.Rep(); r.replicate_1_to_n_and_compare(1, num=1000000, normal=True)
   330 sec
   ```
   
   With this PR:
   ```
   r=rep.Rep(); r.replicate_1_to_n_and_compare(1, num=1000000, normal=True)
   160 sec
   ```
   
   Individual `_bulk_get` response times shows an even higher improvement: an 
8x speedup:
   
   On main:
   ```
   [notice] ... POST 
/cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 468
   [notice] ... POST 
/cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 479
   ```
   
   With this PR:
   ```
   [notice] ... POST 
/cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 54
   [notice] ... POST 
/cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 61
   ```
   
   Fixes: https://github.com/apache/couchdb/issues/4183


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to