nickva commented on issue #1200: [DISCUSS] CouchDB Request Size Limits URL: https://github.com/apache/couchdb/pull/1200#issuecomment-370489809 Good write-up @janl `max_http_request_size` is a quick and dirty way of specifying limits but not very accurate and has a loopholes in it still. Because we implemented other limits, it's value has decreased. Moreover, because we fixed how it checks limit and closed some loopholes, its default value started breaking users code. If the cost was the same, I'd think most users would rather pick the more specific limits (doc size, att size, max num of atts etc) than thinking of those limit and then applying some formula to calculate a max http request size. Maybe `max_http_request_size` should be adjusted to an upper bound of `max_doc_size + max attachment_size * max_num_attachment_per_doc` expected by users. They are using attachments larger than 64Mb and they seem to work. We even fixed a few issue which prevented uploading of large attachments from timing out, so now chances are users would this limit more often. So maybe at least 500Mb (about 10x increase) or even 1G? Users which worry about DOS would have to apply limits as they see fit for their environment. Replying to the summary specifically: > 1.i & ii From a users' perspective I think 1.i is buggy and is just as random of failure 1.ii. Since when replicating the only indication of a failed write is a failed doc write count bump that the majority of users don't know about, the breakage is quite insidious. It might lead to invalid backups and it might take years before users notice the missing data their backups. So I'd pick 1.ii as better than 1.i from this point of view. A DOS is terrible but at least immediately apparent. Attachments mysteriously disappearing during replication is only to be discovered much later later is a more serious issue. I can imagine already the "My db ate my data blog posts". > 1.iii I like the `max_attachments_per_doc` idea. So given a max doc size, max attachment size, max num of attachment / doc we could almost automatically generate a max http request size value for mp doc puts, attachment PUTs and single document updates. There is one more place were we'd need a limit to completely constrain the http request size based on the more precise limits - `_bulk_docs` requests. We'd have to restrict the max number of docs posted there. Then given max doc size + max num of docs / _bulk_docs request we can automatically calculate a reasonable upper bound on request sizes to most "update/modify/create" APIs. So I think I like 1.iii and it seems 1.iii.2 is similar to the idea of automatically deducing a max http request size from the other limits and rejecting it update based on it. But we' instead let users specify the more precise limits instead of using http max request size as the primary constraint. (One exception I guess if users somehow have a broken proxy or some middleware that cannot cope with large requests and they need to specifically adjust the http request size). > 2 I think we should increase max http request size, at least for 2.2.0 and document how users can apply limits to avoid DOS attacks and how max request size, max doc size and max attachment sizes are related. This would unbreak customers with 64Mb attachments. Maybe add the `max_attachments_per_doc` and `max_docs_per_bulk_doc_request` though these could be in 2.3.0 perhaps. Some more random notes: Another way to handle some of the issues is to teach the replicator to bisect attachment PUTs just like it bisect _bulk_docs when it receives a 413 response: https://github.com/apache/couchdb/blob/40b9f85f0be775fe5508f12332130f2695262595/src/couch_replicator/src/couch_replicator_worker.erl#L481-L489 that would be a pain to write for attachment mp parser streamer code though... Also it is good to keep in mind that replicator could be running on a 3rd cluster (not target or source necessarily) and it would need to handle older or other alternative CouchDB implementations. In that respect it has to "auto-discover" setting by probing, bisecting and guessing. In the table above, technically in <2.0.0 we didn't have max document size, but only max http request size. The setting was called max document size but was just bad name and as soon as we had a proper max document size we renamed it.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services