nickva commented on issue #1200: [DISCUSS] CouchDB Request Size Limits
URL: https://github.com/apache/couchdb/pull/1200#issuecomment-370489809
 
 
   Good write-up @janl 
   
   `max_http_request_size` is a quick and dirty way of specifying limits but 
not very accurate and has a loopholes in it still.  Because we implemented 
other limits, it's value has decreased. Moreover, because we fixed how it 
checks limit and closed some loopholes, its default value started breaking 
users code. If the cost was the same, I'd think most users would rather pick 
the more specific limits (doc size, att size, max num of atts etc) than 
thinking of those limit and then applying some formula to calculate a max http 
request size.
   
   Maybe `max_http_request_size` should be adjusted to an upper bound of 
`max_doc_size  + max attachment_size * max_num_attachment_per_doc` expected by 
users. They are using attachments larger than 64Mb and they seem to work. We 
even fixed a few issue which prevented uploading of large attachments from 
timing out, so now chances are users would this limit more often. So maybe at 
least 500Mb (about 10x increase) or even 1G? Users which worry about DOS would 
have to apply limits as they see fit for their environment.
   
   Replying to the summary specifically:
   
   > 1.i & ii
   
    From a users' perspective I think 1.i is buggy and is just as random of 
failure 1.ii. Since when replicating the only indication of a failed write is a 
failed doc write count bump that the majority of users don't know about, the 
breakage is quite insidious. It might lead to invalid backups and it might take 
years before users notice the missing data their backups. So I'd pick 1.ii as 
better than 1.i from this point of view. A DOS is terrible but at least 
immediately apparent. Attachments mysteriously disappearing during replication 
is only to be discovered much later later is a more serious issue. I can 
imagine already the "My db ate my data blog posts".
   
   > 1.iii
   
   I like the `max_attachments_per_doc` idea. So given a max doc size, max 
attachment size, max num of attachment / doc we could almost automatically 
generate a max http request size value for mp doc puts, attachment PUTs and 
single document updates.
   
   There is one more place were we'd need a limit to completely constrain the 
http request size based on the more precise limits - `_bulk_docs` requests. 
We'd have to restrict the max number of docs posted there.  Then given max doc 
size + max num of docs / _bulk_docs request we can automatically calculate a 
reasonable upper bound on request sizes to most "update/modify/create" APIs.
   
   So I think I like 1.iii and it seems 1.iii.2 is similar to the idea of 
automatically deducing a max http request size from the other limits and 
rejecting it update based on it. But we' instead let users specify the more 
precise limits instead of using http max request size as the primary constraint.
   
   (One exception I guess if users somehow have a broken proxy or some 
middleware that cannot cope with large requests and they need to specifically 
adjust the http request size).
   
   > 2
   
   I think we should increase max http request size, at least for 2.2.0 and 
document how users can apply limits to avoid DOS attacks and how max request 
size, max doc size and max attachment sizes are related. This would unbreak 
customers with 64Mb attachments. Maybe add the `max_attachments_per_doc` and 
`max_docs_per_bulk_doc_request` though these could be in 2.3.0 perhaps.
   
   Some more random notes:
   
   Another way to handle some of the issues is to teach the replicator to 
bisect attachment PUTs just like it bisect _bulk_docs when it receives a 413 
response: 
https://github.com/apache/couchdb/blob/40b9f85f0be775fe5508f12332130f2695262595/src/couch_replicator/src/couch_replicator_worker.erl#L481-L489
  that would be a pain to write for attachment mp parser streamer code though...
   
   Also it is good to keep in mind that replicator could be running on a 3rd 
cluster (not target or source necessarily) and it would need to handle older or 
other alternative CouchDB implementations. In that respect it has to 
"auto-discover" setting by probing, bisecting and guessing.
   
   In the table above, technically in <2.0.0 we didn't have max document size, 
but only max http request size. The setting was called max document size but 
was just bad name and as soon as we had a proper max document size we renamed 
it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to