janl commented on issue #745: Replication with attachments never completes, 
{mp_parser_died,noproc} error
URL: https://github.com/apache/couchdb/issues/745#issuecomment-369905803
 
 
   Great repro Joan. I played with it and came up with this:
   
   The python script uses the standalone attachment API: `/db/doc/att` The 
handler for this request does NOT apply `max_http_request_size` (which happens 
in 
[`chttpd:body/2`](https://github.com/apache/couchdb/blob/master/src/chttpd/src/chttpd.erl#L628-L643)
 or 
[couch_httpd:check_max_request_length()](https://github.com/apache/couchdb/blob/master/src/couch/src/couch_httpd.erl#L452-L460),
 neither of which is used by the standalone attachment API).
   
   The twist now is that the replicator uses multipart requests and not 
standalone attachment requests. Multipart requests are subject to the 
`max_http_request_size` limit.
   
   This leads to the observed behaviour that you can create an attachment in 
one db and can NOT replicate that attachment to another db on the same CouchDB 
node (or another node with the same `max_http_request_size` limit).
   
   Applying `max_http_request_size` in the standalone attachment API is 
trivial[1], but leads to the next unfortunate behaviour:
   
   Say you create a doc with two attachments, with a length that is just under 
`max_http_request_size`, each individual attachment write will succeed, but 
replicating it to another db will, again, produce a multipart request that 
overall is > `max_http_request_size`.
   
   I haven?t checked this, but a conflicting doc with one attachment < 
`max_http_request_size` where the attachment data is conflicted might also 
produce a multipart http request > `max_http_request_size` to replicate both 
conflicting revisions and attachment bodies.
   
   This leads us to having to decide:
   1. is `max_http_request_size` a hard hard hard limit or do we accept 
requests larger than that, if they are multipart http requests?
     - if yes, do we apply the `max_document_size` and `max_attachment_size` to 
individual chunks of the multipart request?
   2. if not 1., do we need to rewrite the replicator to not produce requests > 
`max_http_request_size` and potentially do attachments individually?
   
   
   References:
   [1]:
   ```diff
   --- a/src/chttpd/src/chttpd_db.erl
   +++ b/src/chttpd/src/chttpd_db.erl
   @@ -1218,6 +1218,7 @@ db_attachment_req(#httpd{method=Method, 
user_ctx=Ctx}=Req, Db, DocId, FileNamePa
                    undefined -> <<"application/octet-stream">>;
                    CType -> list_to_binary(CType)
                end,
   +           couch_httpd:check_max_request_length(Req),
               Data = fabric:att_receiver(Req, chttpd:body_length(Req)),
               ContentLen = case couch_httpd:header_value(Req,"Content-Length") 
of
                   undefined -> undefined;
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to