On Mon, Jan 27, 2014 at 6:14 AM, Benoit Chesneau <[email protected]> wrote: > On Mon, Jan 27, 2014 at 3:08 AM, Alexander Shorin <[email protected]> wrote: > >> On Mon, Jan 27, 2014 at 5:56 AM, Benoit Chesneau <[email protected]> >> wrote: >> > On Mon, Jan 27, 2014 at 2:50 AM, Alexander Shorin <[email protected]> >> wrote: >> > >> >> On Sun, Jan 26, 2014 at 6:44 PM, Dirkjan Ochtman <[email protected]> >> >> wrote: >> >> > On Wed, Jan 22, 2014 at 9:22 PM, Dirkjan Ochtman <[email protected]> >> >> wrote: >> >> >>> - Action: kocolosk and/or Kxepal will try to look and solve >> >> COUCHDB-1986 issue >> >> > >> >> > Any progress so far? >> >> >> >> I'm very sure that this is something related to Erlang itself since >> >> I'd failed to reproduce this issue on FreeBSD 9.1 (spidermonkey 1.7.0, >> >> erlang 15B02, vbox guest) for long series of test runs, while it >> >> always raises on FreeBSD 10 with Erlang R16B02 which Dave gave me for >> >> testing. I also tried to run this test within same environment for >> >> older releases in attempts to locate broken commit, but our 1.5 and >> >> 1.4 releases are also affected to the same issue. Again, everything is >> >> fine on host with R15. >> >> >> >> >> > Well see my comments and the one from dch. It may be another cause than >> > Erlang. It's most probably something deep in the couch_replicator code. >> > Latest changes init make the problem disappear on my machine while Andy >> was >> > still able to reproduce it. So in something is preventing the replicator >> to >> > timeout correctly. >> >> About side effect from COUCHDB-1953? I'm not sure that this is related >> (but it could introduce accidental "fix" since attachments replication >> becomes faster) since for now I see that this issue is strongly >> depended from OS and Erlang version. >> >> No. see the *latest* comment. > > https://issues.apache.org/jira/browse/COUCHDB-1986?focusedCommentId=13882243 > > What I am swaying is that even if the fix is unrelated it is actually > fixing this error on my mac. I am pretty sure that this error don't happen > on other systems because they are fast enough. I am actually wondering what > is preventing it to timeout. Also not that it was also harder to reproduce > in 1.5 so...
Couldn't say why, but I can surely say where: https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_httpc.erl#L65 changing infinity to some mean value (like 10-20-30 secs) helps replicator to fail with timeout error instead of wait forever for the response. -- ,,,^..^,,,
