[GitHub] nickva commented on issue #1200: [DISCUSS] CouchDB Request Size Limits

GitBox Mon, 05 Mar 2018 09:06:36 -0800

nickva commented on issue #1200: [DISCUSS] CouchDB Request Size Limits
URL: https://github.com/apache/couchdb/pull/1200#issuecomment-370489809

Good write-up @janl

`max_http_request_size` is a quick and dirty way of specifying limits but
not very accurate and has a loopholes in it still. Because we implemented
other limits, it's value has decreased. Moreover, because we fixed how it
checks limit and closed some loopholes, its default value started breaking
users code. If the cost was the same, I'd think most users would rather pick
the more specific limits (doc size, att size, max num of atts etc) than
thinking of those limit and then applying some formula to calculate a max http
request size.

Maybe `max_http_request_size` should be adjusted to an upper bound of
`max_doc_size + max attachment_size * max_num_attachment_per_doc` expected by
users. They are using attachments larger than 64Mb and they seem to work. We
even fixed a few issue which prevented uploading of large attachments from
timing out, so now chances are users would this limit more often. So maybe at
least 500Mb (about 10x increase) or even 1G? Users which worry about DOS would
have to apply limits as they see fit for their environment.

Replying to the summary specifically:

> 1.i & ii

From a users' perspective I think 1.i is buggy and is just as random of
failure 1.ii. Since when replicating the only indication of a failed write is a
failed doc write count bump that the majority of users don't know about, the
breakage is quite insidious. It might lead to invalid backups and it might take
years before users notice the missing data their backups. So I'd pick 1.ii as
better than 1.i from this point of view. A DOS is terrible but at least
immediately apparent. Attachments mysteriously disappearing during replication
is only to be discovered much later later is a more serious issue. I can
imagine already the "My db ate my data blog posts".

> 1.iii

I like the `max_attachments_per_doc` idea. So given a max doc size, max
attachment size, max num of attachment / doc we could almost automatically
generate a max http request size value for mp doc puts, attachment PUTs and
single document updates.

There is one more place were we'd need a limit to completely constrain the
http request size based on the more precise limits - `_bulk_docs` requests.
We'd have to restrict the max number of docs posted there. Then given max doc
size + max num of docs / _bulk_docs request we can automatically calculate a
reasonable upper bound on request sizes to most "update/modify/create" APIs.

So I think I like 1.iii and it seems 1.iii.2 is similar to the idea of
automatically deducing a max http request size from the other limits and
rejecting it update based on it. But we' instead let users specify the more
precise limits instead of using http max request size as the primary constraint.

(One exception I guess if users somehow have a broken proxy or some
middleware that cannot cope with large requests and they need to specifically
adjust the http request size).

> 2

I think we should increase max http request size, at least for 2.2.0 and
document how users can apply limits to avoid DOS attacks and how max request
size, max doc size and max attachment sizes are related. This would unbreak
customers with 64Mb attachments. Maybe add the `max_attachments_per_doc` and
`max_docs_per_bulk_doc_request` though these could be in 2.3.0 perhaps.

Some more random notes:

Another way to handle some of the issues is to teach the replicator to
bisect attachment PUTs just like it bisect _bulk_docs when it receives a 413
response:
https://github.com/apache/couchdb/blob/40b9f85f0be775fe5508f12332130f2695262595/src/couch_replicator/src/couch_replicator_worker.erl#L481-L489
that would be a pain to write for attachment mp parser streamer code though...

Also it is good to keep in mind that replicator could be running on a 3rd
cluster (not target or source necessarily) and it would need to handle older or
other alternative CouchDB implementations. In that respect it has to
"auto-discover" setting by probing, bisecting and guessing.

In the table above, technically in <2.0.0 we didn't have max document size,
but only max http request size. The setting was called max document size but
was just bad name and as soon as we had a proper max document size we renamed
it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] nickva commented on issue #1200: [DISCUSS] CouchDB Request Size Limits

Reply via email to