[jira] [Commented] (COUCHDB-3291) Excessively long document IDs prevent replicator from making progress

ASF GitHub Bot (JIRA) Wed, 08 Feb 2017 10:13:31 -0800

    [ 
https://issues.apache.org/jira/browse/COUCHDB-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858340#comment-15858340
 ]


ASF GitHub Bot commented on COUCHDB-3291:
-----------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/couchdb-couch-replicator/pull/55


> Excessively long document IDs prevent replicator from making progress
> ---------------------------------------------------------------------
>
>                 Key: COUCHDB-3291
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-3291
>             Project: CouchDB
>          Issue Type: Bug
>            Reporter: Nick Vatamaniuc
>
> Currently there is not protection in couchdb from creating IDs which are too 
> long. So large IDs will hit various implicit limits which usually results in 
> unpredictable failure modes.
> On such example implicit limit is hit in the replicator code. Replicate 
> usually fetches document IDs in a bulk-like call either gets them via changes 
> feed, computes revs_diffs in a post or inserts them with bulk_docs, except 
> one case when it fetch open_revs. There it uses a single GET request. That 
> requests fails because there is a bug / limitation in the http parser. The 
> first GET line in the http request has to fit in the receive buffer for the 
> receiving socket. 
> Increasing that buffer allow passing through larger http requests lines. In 
> configuration options it can be manipulated as 
> {code}
>  chttpd.server_options="[...,{recbuf, 32768},...]"
> {code}
> Steve Vinoski mentions something about a possible bug in http packet parser 
> code as well:
> http://erlang.org/pipermail/erlang-questions/2011-June/059567.html
> Tracing this a bit I see that a proper mochiweb request is never even created 
> and instead request hangs. So that confirms it further. It seems in the code 
> here:
> https://github.com/apache/couchdb-mochiweb/blob/bd6ae7cbb371666a1f68115056f7b30d13765782/src/mochiweb_http.erl#L90
> The timeout clause is hit. Adding a catchall exception I get the 
> {tcp_error,#Port<0.40682>,emsgsize} message which we don't handle. Seems like 
> a sane place to throw a 413 or such there.
> There are probably multiple ways to address the issue:
>  * Increase mochiweb listener buffer to fit larger doc ids. However that is a 
> separate bug and using it to control document size during replication is not 
> reliable. Moreover that would allow larger IDs to propagate through the 
> system during replication, then would have to configure all future 
> replication source with the same maximum recbuf value.
>  * Introduce a validation step in {code} couch_doc:validate_docid {code}. 
> Currently that code doesn't read from config files and is in the hotpath. 
> Added a config read in there might reduce performance.  If that is enabled it 
> would stop creating new documents with large ids. But have to decide how to 
> handle already existing IDs which are larger than the limit.
>  * Introduce a validation/bypass in the replicator. Specifically targeting 
> replicator might help prevent propagation of large IDs during replication. 
> There is a already a similar case of skipping writing large attachment or 
> large documents (which exceed request size) and bumping {code} 
> doc_write_failures {code}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (COUCHDB-3291) Excessively long document IDs prevent replicator from making progress

Reply via email to