[
https://issues.apache.org/jira/browse/COUCHDB-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858340#comment-15858340
]
ASF GitHub Bot commented on COUCHDB-3291:
-----------------------------------------
Github user asfgit closed the pull request at:
https://github.com/apache/couchdb-couch-replicator/pull/55
> Excessively long document IDs prevent replicator from making progress
> ---------------------------------------------------------------------
>
> Key: COUCHDB-3291
> URL: https://issues.apache.org/jira/browse/COUCHDB-3291
> Project: CouchDB
> Issue Type: Bug
> Reporter: Nick Vatamaniuc
>
> Currently there is not protection in couchdb from creating IDs which are too
> long. So large IDs will hit various implicit limits which usually results in
> unpredictable failure modes.
> On such example implicit limit is hit in the replicator code. Replicate
> usually fetches document IDs in a bulk-like call either gets them via changes
> feed, computes revs_diffs in a post or inserts them with bulk_docs, except
> one case when it fetch open_revs. There it uses a single GET request. That
> requests fails because there is a bug / limitation in the http parser. The
> first GET line in the http request has to fit in the receive buffer for the
> receiving socket.
> Increasing that buffer allow passing through larger http requests lines. In
> configuration options it can be manipulated as
> {code}
> chttpd.server_options="[...,{recbuf, 32768},...]"
> {code}
> Steve Vinoski mentions something about a possible bug in http packet parser
> code as well:
> http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
> Tracing this a bit I see that a proper mochiweb request is never even created
> and instead request hangs. So that confirms it further. It seems in the code
> here:
> https://github.com/apache/couchdb-mochiweb/blob/bd6ae7cbb371666a1f68115056f7b30d13765782/src/mochiweb_http.erl#L90
> The timeout clause is hit. Adding a catchall exception I get the
> {tcp_error,#Port<0.40682>,emsgsize} message which we don't handle. Seems like
> a sane place to throw a 413 or such there.
> There are probably multiple ways to address the issue:
> * Increase mochiweb listener buffer to fit larger doc ids. However that is a
> separate bug and using it to control document size during replication is not
> reliable. Moreover that would allow larger IDs to propagate through the
> system during replication, then would have to configure all future
> replication source with the same maximum recbuf value.
> * Introduce a validation step in {code} couch_doc:validate_docid {code}.
> Currently that code doesn't read from config files and is in the hotpath.
> Added a config read in there might reduce performance. If that is enabled it
> would stop creating new documents with large ids. But have to decide how to
> handle already existing IDs which are larger than the limit.
> * Introduce a validation/bypass in the replicator. Specifically targeting
> replicator might help prevent propagation of large IDs during replication.
> There is a already a similar case of skipping writing large attachment or
> large documents (which exceed request size) and bumping {code}
> doc_write_failures {code}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)