Noah, was able to reproduce your issue by tweaking the test to create
more leaf revisions for a document:
diff --git a/test/etap/242-replication-many-leaves.t
b/test/etap/242-replication-many-leaves.t
index d8d3eb9..4eb4765 100755
--- a/test/etap/242-replication-many-leaves.t
+++ b/test/etap/242-replication-many-leaves.t
@@ -56,7 +56,7 @@ doc_ids() ->
doc_num_conflicts(<<"doc1">>) -> 10;
doc_num_conflicts(<<"doc2">>) -> 100;
-doc_num_conflicts(<<"doc3">>) -> 286.
+doc_num_conflicts(<<"doc3">>) -> 500.
main(_) ->
With that change, I get exactly the same timeout as you get when the
test runs the push replication. It turns out that some _bulk_docs
requests are taking more than 30 seconds (default replication
connection timeout) therefore the replication request retry messages.
Verified this by timing the _bulk_docs handler to log the time it
takes:
diff --git a/src/couchdb/couch_httpd_db.erl b/src/couchdb/couch_httpd_db.erl
index d7ecb4a..442571d 100644
--- a/src/couchdb/couch_httpd_db.erl
+++ b/src/couchdb/couch_httpd_db.erl
@@ -297,6 +297,7 @@
db_req(#httpd{path_parts=[_,<<"_ensure_full_commit">>]}=Req, _Db) ->
send_method_not_allowed(Req, "POST");
db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db) ->
+ T0 = now(),
couch_stats_collector:increment({httpd, bulk_requests}),
couch_httpd:validate_ctype(Req, "application/json"),
{JsonProps} = couch_httpd:json_body_obj(Req),
@@ -357,7 +358,9 @@
db_req(#httpd{method='POST',path_parts=[_,<<"_bulk_docs">>]}=Req, Db)
->
{ok, Errors} = couch_db:update_docs(Db, Docs, Options,
replicated_changes),
ErrorsJson =
lists:map(fun update_doc_result_to_json/1, Errors),
- send_json(Req, 201, ErrorsJson)
+ Rr = send_json(Req, 201, ErrorsJson),
+ ?LOG_ERROR("BULK DOCS took ~p ms~n",
[timer:now_diff(now(), T0) / 1000]),
+ Rr
end
end;
db_req(#httpd{path_parts=[_,<<"_bulk_docs">>]}=Req, _Db) ->
I was getting _bulk_docs response times after 50 seconds.
This convinces me there's nothing wrong with the codebase, the
timeouts just needs to be increased:
diff --git a/test/etap/242-replication-many-leaves.t
b/test/etap/242-replication-many-leaves.t
index d8d3eb9..6508112 100755
--- a/test/etap/242-replication-many-leaves.t
+++ b/test/etap/242-replication-many-leaves.t
@@ -77,6 +77,7 @@ test() ->
couch_server_sup:start_link(test_util:config_files()),
ibrowse:start(),
crypto:start(),
+ couch_config:set("replicator", "connection_timeout", "90000", false),
Pairs = [
{source_db_name(), target_db_name()},
@@ -287,6 +288,6 @@ replicate(Source, Target) ->
receive
{'DOWN', MonRef, process, Pid, Reason} ->
etap:is(Reason, normal, "Replication finished successfully")
- after 300000 ->
+ after 900000 ->
etap:bail("Timeout waiting for replication to finish")
end.
Alternatively the test can be updated to create less revisions for the
document doc3. The current revisions # is 286 but for the tests'
purpose 205+ is enough, which should make it faster - 7000 (max url
length) / length(DocRevision) = 205
If it's ok for you, updating the timeouts plus reducing the # from 286
to 210 is fine for me.
On Mon, Jan 23, 2012 at 12:00 AM, Noah Slater <[email protected]> wrote:
> I'm just the dumb QA guy.
>
> If you have some diagnostics you want me to run on my machine, I am happy
> to.
>
> On Sun, Jan 22, 2012 at 11:31 PM, Filipe David Manana
> <[email protected]>wrote:
>
>> On Sun, Jan 22, 2012 at 7:20 PM, Noah Slater <[email protected]> wrote:
>> > Works. How do we proceed?
>>
>> For how much time does the test runs? On 2 different physical
>> machines, it takes about 1 minute and 10 seconds for me.
>>
>> Perhaps some manual replication tests could confirm if there's
>> something wrong with the codebase, your environment or if simply
>> increasing the timeout is not alarming.
>>
>> >
>> > On Sun, Jan 22, 2012 at 7:05 PM, Noah Slater <[email protected]>
>> wrote:
>> >
>> >> OVAR 9000! (Testing now...)
>> >>
>> >>
>> >> On Sun, Jan 22, 2012 at 6:56 PM, Filipe David Manana <
>> [email protected]>wrote:
>> >>
>> >>> On Sun, Jan 22, 2012 at 6:47 PM, Noah Slater <[email protected]>
>> >>> wrote:
>> >>> > No change, still fails.
>> >>>
>> >>> Noah, to try to find out if it's due to slowness of the machine or
>> >>> some other issue, do you think you can try to increase the following
>> >>> timeout in the test?
>> >>>
>> >>> diff --git a/test/etap/242-replication-many-leaves.t
>> >>> b/test/etap/242-replication-many-leaves.t
>> >>> index d8d3eb9..737cd31 100755
>> >>> --- a/test/etap/242-replication-many-leaves.t
>> >>> +++ b/test/etap/242-replication-many-leaves.t
>> >>> @@ -287,6 +287,6 @@ replicate(Source, Target) ->
>> >>> receive
>> >>> {'DOWN', MonRef, process, Pid, Reason} ->
>> >>> etap:is(Reason, normal, "Replication finished successfully")
>> >>> - after 300000 ->
>> >>> + after 900000 ->
>> >>> etap:bail("Timeout waiting for replication to finish")
>> >>> end.
>> >>>
>> >>> >
>> >>> > On Sun, Jan 22, 2012 at 6:08 PM, Noah Slater <[email protected]>
>> >>> wrote:
>> >>> >
>> >>> >>
>> >>> >> On Sun, Jan 22, 2012 at 6:01 PM, Filipe David Manana <
>> >>> [email protected]>wrote:
>> >>> >>
>> >>> >>> Noah, does it fail occasionally or every time for you?
>> >>> >>>
>> >>> >>
>> >>> >> Fails every time.
>> >>> >>
>> >>> >>
>> >>> >>> I'm assuming you're with a slow machine or the machine is a bit
>> >>> >>> overloaded.
>> >>> >>>
>> >>> >>
>> >>> >> Shouldn't be, I'm not doing anything else right now, and this is a
>> new
>> >>> MBA.
>> >>> >>
>> >>> >>
>> >>> >>> Can you try with the following patch?
>> >>> >>>
>> >>> >>
>> >>> >> Yes. Will report back.
>> >>> >>
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Filipe David Manana,
>> >>>
>> >>> "Reasonable men adapt themselves to the world.
>> >>> Unreasonable men adapt the world to themselves.
>> >>> That's why all progress depends on unreasonable men."
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>>
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."