GitHub user tonysun83 opened a pull request:
https://github.com/apache/couchdb-chttpd/pull/156
Introduce max_http_request_size to replace max_document_size
This PR serves to consolidate ideas from
https://github.com/apache/couchdb-chttpd/pull/114 (the discussion got a little
long and convoluted) and to finalize implementation details.
**Background**
Sometimes users want to limit the actual document size of various requests.
Currently, our ```max_document_size``` is a misnomer. It's actually the request
body of the http request. So ```_bulk_docs``` requests with multiple docs or
docs with attachments would still count within this ```max_document_size```
parameter. The name is misleading.
**Requirements**
This feature within the following requirements:
R1) Use config parameter names that actually reflect their intentions.
R2) Prevent DDOS attacks.
R3) Do not allow loopholes for users to bypass the restriction and thereby
create weird scenarios. We see this mostly in replication where attachments use
multi-part requests.
R4) Reject a document that is actually valid. We see this scenario in
replication as well. Replication needs its own section so will discuss a bit
more further down.
**Proposal**
P1) Replace ```max_document_size``` with ```max_http_request_size```. This
initial change will serve the same purpose as before except with a different
name. It meets requirements R1 and R2 above.
P2) Actually use ```max_document_size``` for update requests. So for
PUT/POST requests that may or may not include an attachment, we only look at
the document itself. For ```_bulk_docs```, if a document exceeds
```max_document_size```, we return an error with the document id, and the```
_bulk_docs``` is rejected. For update ```_update``` handlers, the same
restriction is applied to update so that the updated document should not exceed
the limit. This meets most of the R3 requirement, but multipart requests with
replication are the problem. (More on that in replication section)
P3) Possibly introduce a ```use_max_document_size``` parameter so that we
don't always execute document size computations.
**Replication**
The new restriction has a big impact on replication for many reasons.
1) Before the following fix:
https://github.com/apache/couchdb-couch-replicator/pull/49/files,
replications would crash when the request limit was set too low. Now that we're
changing the meaning to use actual document size instead of request size, we
may have to revisit that fix.
2) Currently, a rare scenario exists during replication when a customer has
documents very close to the old misnamed ```max_document_size```. When we add
extra query parameters or the doc is wrapped in ```_bulk_docs```, the request
size exceeds the old ```max_document_size``` config, and replication fails. By
changing the name and actually looking at the documents itself, we should be
able to avoid this scenario in R4.
3) Replication uses multipart requests when attachments are included with
the document. This means that a request is streamed. This means we can't use
document size as the restriction because we would have to wait for the stream
to finish before extracting the document out of the entire request. @davisp
suggested we read from the socket and fail once the number of bytes from the
stream exceed ```max_document_size```. However, the details are still unclear
to me. I need to look at how this socket restriction will work with attachments
and headers. Bottom line is that we need to get replication working such that
the user can't use attachments to bypass the request, and at the same time,
only the actual document is used as the limitation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloudant/couchdb-chttpd
64299-add-new-request-parameter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/couchdb-chttpd/pull/156.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #156
----
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---